Replies: 10 comments
-
Related discussion: @visch - Do you primarily have this in mind for top-level properties or for sub-properties? If only for subproperties, I think it makes sense to have an 'any valid json object' data type that does not need to be recursively defined. We don't have support for this as of yet in the SDK but probably the Postgres tap has some prior art since json columns are supported.) For top-level properties, it would be trickier to leave these undefined/unnamed since most targets will at least to know the top-level column list in the If the tap data is fully variant, with no prior contracts, presumably a tap developer could mitigate the above by sending a single What do you think of this approach? |
Beta Was this translation helpful? Give feedback.
-
@aaronsteers I like that, and that's probably the behavior I'd expect, for example, from a hypothetical |
Beta Was this translation helpful? Give feedback.
-
Yeah that makes a lot of sense, I don't know why I didn't think of it! Could add to the sample an example of a unknown schema. As it's straight forward something like {
"type": "object",
"properties:": {
"type":"object",
"raw": {}
}
} Using |
Beta Was this translation helpful? Give feedback.
-
The tap mongo is not hypothetical 🎉
I don't do this though, I deferred it to the target (which I also built for BigQuery). Mongo db data not wrapped in dict could be sent to blob storage and be valid/more representative of its source structure without further processing so I didn't take the wrap approach in the tap. My initial impression is that it makes more sense in the target to handle extra props via a variant type approach more than it is in the tap to nest its variant data. I guess the latter might play better with some existing targets but we be interested to know exactly which. 🤔 |
Beta Was this translation helpful? Give feedback.
-
@z3z1ma that is awesome! I see you're using @aaronsteers I think |
Beta Was this translation helpful? Give feedback.
-
Awesome tap @z3z1ma :D I messed with |
Beta Was this translation helpful? Give feedback.
-
@visch yeah, I'm suggesting we use that JSON schema feature to give control over property pruning. But you're right, I don't see how record properties are not removed in @z3z1ma's tap 🤔 |
Beta Was this translation helpful? Give feedback.
-
That a good point 😄 Here is how Ive been doing it in my setup Alternatively it could be more efficient from a cpu cycle perspective to monkey patch conform record types to a noop. |
Beta Was this translation helpful? Give feedback.
-
@z3z1ma that makes sense, so you're shimming the fields in the schema, just with an empty type object 👍 |
Beta Was this translation helpful? Give feedback.
-
Coming back to this after quite a while and after a reading and rereading of the
@edgarrmondragon - What do you think of the above overall? Would you tweak or change anything in the above? |
Beta Was this translation helpful? Give feedback.
-
Right now in https://github.com/meltano/sdk/blob/main/singer_sdk/helpers/_typing.py#L198 we ignore any properties that do not exist and print a message. While this is nice in most cases there are cases where you would rather have the tap print the data to the Record even if there is no corresponding schema. Today this works with nested objects, and is a workaround.
A big scary config item like
IGNORE_SCHEMA_THIS_IS_SCARY_COULD_BREAK_TARGETS
would be fine as well, what do you all think?Beta Was this translation helpful? Give feedback.
All reactions