Skip to content

fields & fieldlists interfaces and implementation #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jan 29, 2023
Merged

Conversation

jonemo
Copy link
Contributor

@jonemo jonemo commented Jan 19, 2023

This separates the Field and Fields interfaces, implementation, and tests from my WIP branch http-interface-update-2. The goals of doing this are

  1. to unblock @dlm6693's work on the AWS sigv4 signer
  2. to facilitate the discussion about quoting and escaping of header values with @nateprewitt

This PR only contains the interface and implementation of Field and Fields. All the work in actually using them for requests and responses happens on the http-interface-update-2 branch.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@jonemo jonemo requested review from nateprewitt and dlm6693 January 19, 2023 08:11
@@ -14,11 +14,13 @@
# TODO: move all of this out of _private


from collections import OrderedDict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this - all dicts are insertion-ordered in python now as part of the contract, and you don't seem to be be relying on any of the remaining niche features of OrderedDict

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. And I agree that this should be discussed. For now this is taken verbatim from the spec which says this:

entries: OrderedDict[str, Field] # OrderedMap<String, Field>

See also this comment thread above: #122 (comment)

cc @nateprewitt to chime in since he wrote that spec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in the specification was to convey the ideas to a broad audience of language teams. I have some oddities like this in the orderedness of Python dictionaries isn't necessarily widely known. So the pseudo-code examples shouldn't be taken as gospel beyond maybe interface framing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, I'm glad we talked about it then. So we don't need to keep OrderedDict for it's order-maintaining property. But now I know why the order matters and it means that I need to fix my Fields.__eq__ implementation. And the new __eq__ will benefit form the equality definition of OrderedDict, so I'm going to keep it anyway!

(["v,a,l,1", "val2"], '"v,a,l,1",val2'),
# Double quotes are escaped with a single backslash. The second backslash below
# is for escaping the actual backslash in the string for Python.
(['"quotes"', "val2"], '\\"quotes\\",val2'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC only mentions escaping in the context of quoted values. You also need to test for escaping backslashes.

Suggested change
(['"quotes"', "val2"], '\\"quotes\\",val2'),
(['"quotes"', "val2"], '"\\"quotes\\"",val2'),
(["foo,bar\\", "val2"], '"foo,bar\\\\",val2'),

Copy link
Contributor Author

@jonemo jonemo Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I only escape quotes inside of quoted values, then values with pre-existing quotes are indistinguishable from those were smithy-python added the quotes:

raw field value serialized value in header
a,b,c "a,b,c"
"a,b,c" "a,b,c" or "\"a,b,c"\" ?

Is that what we want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You always escape double quotes, but when you do you need to wrap the whole element in double quotes. So:

['a', '"', 'b'] -> "a,\",b"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised by that. Nowhere in the discussion so far have we considered putting quotes around multiple Field values.

@@ -14,11 +14,13 @@
# TODO: move all of this out of _private


from collections import OrderedDict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in the specification was to convey the ideas to a broad audience of language teams. I have some oddities like this in the orderedness of Python dictionaries isn't necessarily widely known. So the pseudo-code examples shouldn't be taken as gospel beyond maybe interface framing.

jonemo and others added 2 commits January 19, 2023 20:05
Co-authored-by: Nate Prewitt <nate.prewitt@gmail.com>
@jonemo
Copy link
Contributor Author

jonemo commented Jan 20, 2023

I updated the field value serialization logic in a way that incorporates all (?) the suggestions and constraints from sigv4 test cases. Specifically:

  • Spaces in values no longer trigger quoting. That only leaves commas as reason for quoting.
  • Double quotes are only escaped when quoting was applied. Same for backslashes.
  • When a string is already quoted (starts and ends with double quotes) it doesn't get modified, even if it contains additional double quotes or backslashes.

I understand how we arrived here, but I don't like it because:

  1. The logic for quoting and escaping is complicated. Even I as the person who just wrote it find it hard to predict the outcome.
  2. It isn't round-trip-able. Because we don't know if surrounding quotes were part of the original value or added by our serializer, the deserializer can't know whether to remove them. Not a problem per se, but the deserialization utility method that @JordonPhillips asked for will be another place with complex difficult to predict behavior.

@jonemo
Copy link
Contributor Author

jonemo commented Jan 20, 2023

Note to self, I haven't dealt with this yet:

    All field names are case insensitive and
    case-variance must be treated as equivalent.
    Names MAY be normalized but SHOULD be preserved
    for accuracy during transmission.

@jonemo jonemo requested review from dlm6693 and nateprewitt January 23, 2023 07:14

See :func:`Field.as_string` for quoting and escaping logic.
"""
CHARS_TO_QUOTE = (",", '"')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to a module-level constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It's an implementation detail of this function. I would prefer to not make it part of the public interface of this module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? We don't need to redefine it every time this function gets called. It is a constant after all. Assuming that's why you made it upper case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

Because then others can start importing it and changing it becomes a backwards-incompatible change.

I made it lowercase to make it look less like a module-level constant.

JordonPhillips
JordonPhillips previously approved these changes Jan 24, 2023
@@ -193,16 +193,16 @@ def __eq__(self, other: object) -> bool:
)

def __repr__(self) -> str:
return f'Field(name="{self.name}", value=[{self.value}], kind={self.kind})'
return f"Field(name={self.name!r}, value={self.value!r}, kind={self.kind!r})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you can also use !s for str and !a for ascii.

Copy link
Contributor

@nateprewitt nateprewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments/questions but otherwise I think this looks good!

:param encoding: The string encoding to be used when converting the ``Field``
name and value from ``str`` to ``bytes`` for transmission.
"""
init_fields = [fld for fld in initial] if initial is not None else []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason we're creating a new list here instead of just using initial? Is the concern we're going to get a generator or some other unindexable value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I kept this list comprehension here is that I wanted to allow any Iterable in the constructor signature. I could limit the type of initial to just list | None, or I could use list() instead of the list comprehension.

f"{', '.join(non_unique_names)}."
)
init_tuples = zip(init_field_names, init_fields)
self.entries: OrderedDict[str, interfaces.http.Field] = OrderedDict(init_tuples)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we'd arrived at using an OrderedDict in a previous discussion. The reason we'd arrived at this was so we perform this by default for header ordering?

    def __eq__(self, other):
        return dict.__eq__(self, other) and all(map(_eq, self, other))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, took me a minute to understand. Yes, after the introduction of __iter__ this snippet would indeed work.

And yes, I decided to stick with OrderDict because it gives me the ordering check for free as part of the equality check:

>>> tuples1 = [('a', 1), ('b', 2)]
>>> tuples2 = [('b', 2), ('a', 1)]
>>> dict1, dict2 = dict(tuples1), dict(tuples2)
>>> od1, od2 = OrderedDict(tuples1), OrderedDict(tuples2)
>>> dict1 == dict2
True
>>> od1 == od2
False

Comment on lines +259 to +264
def get_by_type(self, kind: FieldPosition) -> list[interfaces.http.Field]:
"""Helper function for retrieving specific types of fields.

Used to grab all headers or all trailers.
"""
return [entry for entry in self.entries.values() if entry.kind is kind]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works fine for now. I am curious though if we'd ever want to track this on insertion/removal to avoid iterating every header each time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean make entries a dict of dicts that gets accessed like self.entries[FieldPosition.HEADER]["x-my-header-name"]? Of course doable, but would require double bookkeeping because the API also expects the complete list of all fields to be maintained in order.

Comment on lines +30 to +34
"""
Header field. In HTTP this is a header as defined in RFC 9114 Section 6.3.
Implementations of other protocols may use this FieldPosition for similar types
of metadata.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious where this pattern of including the docstring after what it's discussing. I've only ever seen it in this repo and it strikes me as unintuitive. Am I missing a new convention? 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do this because many years ago I learned that this is the one way to make IDEs pick it up as the docstring. I always assumed but never validated that this is also true for documentation generators. It's surprisingly difficult to find documentation for this behavior, but this Stackoverflow answer suggests that I am not the only one who arrived at this conclusion.

A possible explanation for why it works this way is this: Consider the alternative of putting the enum entry docstring before the entry. The first entry's docstring would be immediately adjacent to the enum class docstring. Python would merge those into one string, and docs generators and IDEs wouldn't know which part belongs to the class and which to the entry:

class MyEnum(Enum):
    """My class docstring"""
    """... is directly next to my entry docstring!"""
    FIRST_ENTRY = 0
    """My second entry docstring"""
    SECOND_ENTRY = 1

Copy link
Contributor

@dlm6693 dlm6693 Jan 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe also for doc generation not doing it below the variable can cause issues depending on which generator you're using.

An alternative that I've come across if preferred is similar to how we format doc strings for functions:

class MyEnum(Enum):
    """My class docstring
    
    Attributes
    - - - - - - - - - - 
    FIRST_ENTRY: int
        First entry doc

    SECOND_ENTRY: int
        Second entry doc
    """

Co-authored-by: Nate Prewitt <nate.prewitt@gmail.com>
@jonemo jonemo requested a review from dlm6693 January 28, 2023 00:08
Copy link
Contributor

@nateprewitt nateprewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonemo
Copy link
Contributor Author

jonemo commented Jan 29, 2023

@dlm6693 Merging is currently blocked because your latest review is "changes requested". Anything you still want me to do here?

@dlm6693
Copy link
Contributor

dlm6693 commented Jan 29, 2023

@jonemo gonna take one final pass now

Copy link
Contributor

@dlm6693 dlm6693 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📦

@jonemo jonemo merged commit 300a17d into develop Jan 29, 2023
@jonemo jonemo deleted the http-fields branch January 29, 2023 18:00
@jonemo jonemo mentioned this pull request Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants