WIP: use proto-plus instead of _pb #332

cguardia · 2020-10-19T08:17:05Z

This is a proof of concept PR to fix #319. This almost works, but there are four tests failing. The failures are of two types:

The message.from_json method, which is the likeliest successor to the json_format.ParseDict call used with the _pb object, does not allow us to pass in the "ignore_unknown_fields" flag to the parser, thus causing errors in tests that try to ignore fields.
The json format created by the regular json library has some slight differences with the format generated by json_format, which causes tests trying to compare repr output to fail.

I don't like very much the use of the json library in this PR, and would prefer to avoid it if possible. I think perhaps making a PR to proto-plus to make the to_json and from_json methods more flexible would be a good idea, along with possibly adding to_dict and from_dict methods there, which seems to me could be generally helpful.

tswast · 2020-10-19T14:09:24Z

I've filed googleapis/proto-plus-python#153 and googleapis/proto-plus-python#152 which will allow us to ignore unknown fields and avoid the use of the json module.

tswast · 2020-10-19T17:20:41Z

google/cloud/bigquery/model.py

- this._proto = json_format.ParseDict(
- resource, types.Model()._pb, ignore_unknown_fields=True
- )
+ this._proto = types.Model.from_json(json.dumps(resource))


Can we pass resource to the Model constructor here?

googleapis/proto-plus-python#152 (comment)

We'll still want googleapis/proto-plus-python#153 to support ignore_unknown_fields, though.

I already had tried that during my testing. It gives a KeyError on the metaclass during __init__, right here:

https://github.com/googleapis/proto-plus-python/blob/master/proto/message.py#L425

I was using this code:

this._proto = types.Model(resource)

We should be able to update this to types.Model(resource, ignore_unknown_fields=True) now.

tswast · 2020-10-20T17:56:23Z

googleapis/proto-plus-python#154 adds ignore_unknown_fields to both the Model constructor and a to_dict method. Hopefully that means this PR won't be blocked by proto-plus once they make a release.

cguardia · 2020-10-20T18:03:24Z

They already released. Still getting some test errors here, but I think they are related to assumptions about how the pb json_format conversion worked. Hopefully will have a working PR at some point tonight.

cguardia · 2020-10-21T06:33:13Z

@tswast Speaking about hidden json_format pb conversion behavior: somewhere along the line it converted key names to camelCase automatically, so many checks we have on resources fail due to the ids now using snake_case and the manually constructed test resources using camelCase. In some cases, the camelCase names are part of the main code, like in job and client modules. Dov said this has been a problem elsewhere. I'm still not familiar enough with the bigquery codebase to be completely sure which way to go here: my instinct is to convert everything to snake_case, what do you think?

tswast · 2020-10-21T14:23:34Z

json_format pb conversion behavior: somewhere along the line it converted key names to camelCase automatically, so many checks we have on resources fail due to the ids now using snake_case and the manually constructed test resources using camelCase.

@cguardia You mean it's taking the camelCase keys we set in tests and converting them to snake_case? I suspect the API will accept either in most cases, since the language guide says

Parsers accept both the lowerCamelCase name (or the one specified by the json_name option) and the original proto field name.

https://developers.google.com/protocol-buffers/docs/proto3#json

Does the same behavior happen with to_dict?

cguardia · 2020-10-21T18:50:37Z

Not using JSON anymore. The tests are expecting CamelCase, but the calls to the constructor and to_dict are returning snake_case. I was thinking of getting rid of camelCase everywhere, but unsure if that would affect something else.

tswast · 2020-10-21T18:54:05Z

calls to the constructor and to_dict are returning snake_case. I was thinking of getting rid of camelCase everywhere, but unsure if that would affect something else.

This seems like a safe operation to me. We aren't ever sending that JSON representation to the API, so it shouldn't even affect that.

cguardia · 2020-10-26T02:19:08Z

Well, this has been quite the rabbit hole. Turns out not only the tests, but some of the code as well, seem to be counting on the automatic case conversion, so that a lot of code would need to be changed to get rid of that. The json_format call also performs some data conversion, which again is expected to happen in some of the code. I believe we have three options:

Bite the bullet and go through all the code and tests until this is right. This is complicated by some API calls that want camelCase arguments, so each change has to be carefully done to avoid affecting those.
Write some helper code to do the case conversion on our side and handle this as an API requirement.
Use json.loads and json.dumps in combination with the Model.to_json code to get the conversion done like it was done before (mostly the current code with a few additions).

@tswast what do you think?

tswast · 2020-10-26T15:27:53Z

and 3. sound like asking for trouble performance-wise. Those would require several passes over the data in Python code to get things in the structure we want.

Bite the bullet and go through all the code and tests until this is right.

I don't recall what code relies on this. Could you send a pointer to the line on GitHub? Are there really all that many tests for this? I recall that all of the properties we use _proto for are read-only.

cguardia · 2020-10-28T23:00:20Z

I'm going to close this one, and open another with the current approach.

WIP: use proto-plus instead of _pb

ebd3e8b

cguardia requested a review from a team October 19, 2020 08:17

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Oct 19, 2020

cguardia requested a review from tswast October 19, 2020 08:17

cguardia added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Oct 19, 2020

tswast mentioned this pull request Oct 19, 2020

Provide ignore_unknown_fields=True option in from_json and from_dict googleapis/proto-plus-python#153

Closed

tswast reviewed Oct 19, 2020

View reviewed changes

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Oct 22, 2020

cguardia closed this Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: use proto-plus instead of _pb #332

WIP: use proto-plus instead of _pb #332

cguardia commented Oct 19, 2020

tswast commented Oct 19, 2020

tswast Oct 19, 2020

tswast Oct 19, 2020

cguardia Oct 19, 2020

tswast Oct 21, 2020

tswast commented Oct 20, 2020

cguardia commented Oct 20, 2020

cguardia commented Oct 21, 2020

tswast commented Oct 21, 2020

cguardia commented Oct 21, 2020

tswast commented Oct 21, 2020

cguardia commented Oct 26, 2020

tswast commented Oct 26, 2020 •

edited

Loading

cguardia commented Oct 28, 2020

WIP: use proto-plus instead of _pb #332

WIP: use proto-plus instead of _pb #332

Conversation

cguardia commented Oct 19, 2020

tswast commented Oct 19, 2020

tswast Oct 19, 2020

Choose a reason for hiding this comment

tswast Oct 19, 2020

Choose a reason for hiding this comment

cguardia Oct 19, 2020

Choose a reason for hiding this comment

tswast Oct 21, 2020

Choose a reason for hiding this comment

tswast commented Oct 20, 2020

cguardia commented Oct 20, 2020

cguardia commented Oct 21, 2020

tswast commented Oct 21, 2020

cguardia commented Oct 21, 2020

tswast commented Oct 21, 2020

cguardia commented Oct 26, 2020

tswast commented Oct 26, 2020 • edited Loading

cguardia commented Oct 28, 2020

tswast commented Oct 26, 2020 •

edited

Loading