Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helper API for per-request pipeline configuration #6378

Closed
chlowell opened this issue Jul 16, 2019 · 6 comments
Closed

Helper API for per-request pipeline configuration #6378

chlowell opened this issue Jul 16, 2019 · 6 comments
Assignees
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library.

Comments

@chlowell
Copy link
Member

The pipeline API supports arbitrary per-run configuration of everything from policies to transport implementations via kwargs. This requires policies and transport abstractions to pop their expected configuration from kwargs before passing the collection to the next pipeline node. Failure to do so (e.g. by using get instead of pop) has been the root cause of several bugs.

A suggestion for preventing this in the future comes from this comment:

it could be worth considering ways to declaratively specify lists of kwargs and a helper API that can automatically extract their values and pop them from the kwargs before passing them to the next method/function

@chlowell chlowell added Azure.Core Client This issue points to a problem in the data-plane of the library. labels Jul 16, 2019
@xiangyan99 xiangyan99 added this to the [2020] January milestone Nov 1, 2019
@xiangyan99
Copy link
Member

We need a design review before updating the code.

@bryevdv
Copy link
Contributor

bryevdv commented Dec 11, 2019

The basic ideas is to move the responsibility for calling pop in to one place instead of requiring lots of individuals to remember doing it in 100 different places. There is a pretty wide range of ways this could be done, but starting simple, just something like:

class Options(object):
    def __init__(self, kwargs):  # note: no **
        for name in self._names:
            if name in kwargs:
                setattr(self, name, kwargs.pop(name))
            else:
                setattr(self, name, None) # Need to decide correct defaults approach

class MyOptions(Options):
    _names = [
        'timeout', 
        'max_retries',
    ]

Each thing (pipeline, policy, etc) that needs to pop kwargs defines one of these subclasses for itself, then:

In [17]: kw = dict(foo=10, bar=20, timeout=30)

In [18]: opts = MyOptions(kw)  # one right way for everyone to pop kw

In [19]: opts.timeout
Out[19]: 30

In [20]: opts.max_retries is None
Out[20]: True

In [21]: kw
Out[21]: {'foo': 10, 'bar': 20}

One question that needs a little thought is what to do about default values for names that are not in the passed-in kwargs. Is it OK to just always set to None? That's what the above does, but if more flexibility is needed, default values could also be specified on the subclass along with the names in a variety of ways.

From here you could get more fancy to any degree you want:

  • specify types, perform automatic type validation on kwarg values to print consistent and helpful error messages
  • specify help strings, make a sphinx plug-in that automatically rendered Options subclasses in docstrings

E.g. Bokeh does all of the above but also already had a typed property system that could be leveraged to implement things. Since this is all for use internally inside functions, I'd suggest starting simple and then it could always be expanded upon later without disrupting anything.

@xiangyan99
Copy link
Member

This is a good suggestion. But the problem is different transports/policies may vary a lot on how to consume the kwargs. e.g. in RequestIdPolicy, we may consume "request_id", "auto_request_id", "x-ms-client-request-id". We cannot give it a default value because we have different behaviors on it sets to None vs it is not set. Another example is CustomHookPolicy, we need to pop up "raw_response_hook" in on_request (because transport does not like it) and add it back in "on_response" (or we will lose the information)

@bryevdv
Copy link
Contributor

bryevdv commented Jan 6, 2020

But the problem is different transports/policies may vary a lot on how to consume the kwargs. e.g. in RequestIdPolicy, we may consume "request_id", "auto_request_id", "x-ms-client-request-id".

Just to clarify, I was not trying to suggest a single "options" class for all of azure.* that gets used everywhere. Rather, I imagined lots of "options" subclasses, e.g. even up to per transport/policy, each specifying only the options relevant to it. The win (in my mind) is having standard more-or-less declarative way for azure devs to express what options are, (which could if we really wanted also feed in to docs in an automated way) as well as having every transport/policy use a single tested codepath for extracting those in a consistent way.

We cannot give it a default value because we have different behaviors on it sets to None vs it is not set.

It's up to the policy to decide what to do with that information. An options class only needs to convey it. Off the top of my head, maybe:

class _Unset: pass

class Options(object):
    def __init__(self, kwargs):  # note: no **
        for name, default in self._opts:
            if name in kwargs:
                setattr(self, name, kwargs.pop(name))
            else:
                setattr(self, name, default) 

class MyOptions(Options):
    _opts = [
        ('timeout',  None), 
        ('max_retries', _Unset),
        ('debug', False)
    ]

Then, e.g.

In [3]: opts = MyOptions(dict(debug=True))

In [4]: opts.debug
Out[4]: True

In [5]: opts.timeout is None
Out[5]: True

In [6]: opts.max_retries is _Unset
Out[6]: True

That leaves the policy with knowledge that max_retries was explicitly not set, and it can do whatever is appropriate with that information. I'd definitely advocate for no adding any responsibility for interpretation or handling in these options classes. Just to pass on the necessary information and prune **kw in a predictable way.

As I said before there are lots of possible approaches, e.g. if you want to lessen the burden on azure devs, you could interpret single values as having _Unset implicitly as a convenience.

Another example is CustomHookPolicy, we need to pop up "raw_response_hook" in on_request (because transport does not like it) and add it back in "on_response" (or we will lose the information)

Seems orthogonal. CustomHookPolicy could use an "options" class to pop "raw_response_hook" in on_request, save that value, and then put it back in **kw in "on_response" manually. That's just something up toCustomHookPolicy to do because it makes sense for CustomHookPolicy to do.

@xiangyan99 xiangyan99 removed this from the [2020] January milestone Jan 7, 2020
@xiangyan99
Copy link
Member

This needs to revisit policies/transports one by one.

@xiangyan99 xiangyan99 self-assigned this Jan 23, 2020
Copy link

Hi @chlowell, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 18, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Mar 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library.
Projects
None yet
Development

No branches or pull requests

3 participants