Replies: 2 comments
-
@dmosorast - Thanks for starting this discussion! I agree we should have standard or best-practice recommendations here. I have found it always confusing to have columns created in the target when those columns are deselected. In some cases, this can even raise security red flags - for instance, if you explicitly want to remove PII and you end up with a target table that nevertheless contains those excluded columns. An auditor could of course query the table to see there are no values there, but this introduces trust questions and severely limited confidence from a cursory review of table columns. (Speaking from actual past experience here.)
I don't think this is feasible, since it requires preknowledge of all the records' values at the point when the table is created - the point before any records have yet been emitted.
I vote yes - at least as a suggested best practice. (The SDK actually does this by default now, which means the developer and user can expect this filtering automatically in the tap is built on the SDK.) |
Beta Was this translation helpful? Give feedback.
-
Thanks AJ, that's a good point on the PII and trust issue. I'm also leaning yes as a Best Practice. I think the thing that threw me for a loop is that I didn't realize the targets were building the DDL right out of the schema message, but instead I thought they'd be waiting until they got data and translating it into DDL at that time with the guidance of the matching schema. However, as far as code complexity goes, it does make sense to directly translated the schema to a DDL statement in the target of choice, and filtering the schema wouldn't inhibit the other more reactive approach to DDL generation. |
Beta Was this translation helpful? Give feedback.
-
We've gotten a PR on tap-stripe about filtering the
SCHEMA
messages based on field selection so that targets don't have to create null columns if the columns are not selected.This makes sense, but I'm not sure if it's a standard (the singer-io/getting-started repository doesn't have anything about this when it discusses field selection). With the general lack of guidance around how to build targets, this also doesn't really help that space either.
As part of my little conquest I've started on aggregating "Standards" (things that live in the library/best-practices space above the strict "Singer Messaging Specification"), I've added it to that initiative's issue tracking comment.
Since I'm less of a target developer, I thought it'd be good to break open this space and start up a discussion thread here to get everyone else's perspective on the practice. Guiding questions:
Beta Was this translation helpful? Give feedback.
All reactions