Skip to content
This repository has been archived by the owner on Feb 14, 2023. It is now read-only.

Ce/recursive #1

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Ce/recursive #1

wants to merge 3 commits into from

Conversation

carlineng
Copy link

OK, this is super gross, but here's the general idea:

The stack overflow is happening when the code attempts to build the schema, since it gets caught in an infinite loop when going down a recursively defined protobuf. I've defined corresponding "recursive" methods below which compare the FieldDescriptor of the parent message to that of the child message. If the message of the parent and the child have the same type (as in the case of a recursively defined protobuf), then it sets the Spark type of the child message to just be a String.

In the step where the code transforms the protobuf to a dataframe (messageToRow), I simply pass in the parent message on each call to toRowData, and if the parent message has the same type as the child message (again, as in the case of a recursively defined protobuf), it simply returns null.

Note that this only works in simple cases of recursively defined protobufs, where the type of the child is the same as the parent. It will still barf in cases where a grandchild message has the same type as the grandparent (e.g., an Event which contains a View, which contains an Event). I don't believe we have any cases of that, and we can discourage it from happening, but I don't think we can guarantee it.

@carlineng carlineng requested a review from drewrobb October 18, 2017 01:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant