Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for activate version #72

Closed
Tracked by #4
visch opened this issue Jan 16, 2023 · 17 comments · Fixed by #89
Closed
Tracked by #4

Support for activate version #72

visch opened this issue Jan 16, 2023 · 17 comments · Fixed by #89
Assignees

Comments

@visch
Copy link
Member

visch commented Jan 16, 2023

No description provided.

@aaronsteers
Copy link
Contributor

aaronsteers commented Jan 17, 2023

Related PR:

In theory, this was mostly complete, but I never had time to run manual or automated tests.

@aaronsteers
Copy link
Contributor

aaronsteers commented Jan 21, 2023

@visch and @edgarrmondragon - Pardon my errant comment above. That WIP PR is for the tap-side implementation.

Those challenges around sending ACTIVATE_VERSION message and the enabling version integer property are unrelated to the needs here on the target.

For target-side implementation, I think we want ideally a 'deletion' but with two possible implementations:

  1. Only if something like hard_delete=True in config, then fully delete rows that have expired when an ACTIVATE_VERSION message is received.
  2. In all other cases, if ACTIVATE_VERSION is received, then update records deleted records and set the _sdc_deleted_at metadata column in the target table to the current timestamp. Users can then find the active rows by filtering for WHERE _sdc_deleted_at IS NOT NULL.

If the above is correct, and if soft delete behavior is the right "safe default", we can chose to only implement the second behavior in this first pass, and the hard_delete config and corresponding behavior could be added as a later addition.

@visch
Copy link
Member Author

visch commented Jan 23, 2023

@aaronsteers

https://github.com/meltano/sdk/blob/2def28b0a210bed70c5893a189c7c16b160508d0/singer_sdk/sinks/sql.py#L340-L390 has an implementation already for targets. Generally the flow of this function is:

  1. Checks to be sure the table already exists if it doesn't then it just moves on (good)
  2. Creates the version_column _sdc_table_version if it doesn't exist (good)
  3. Checks to see if hard_delete is set to something that isn't False (it will also default to True if not set), if so then it will do hard deletes (Seems good, but it doesn't work because the _sdc_table_version column is never set)
  4. Check to see if the soft_delete_column_name exists ( _sdc_deleted_at ) if hard_delete isn't True (good)

But it doesn't work right now because the _sdc_table_version column doesn't have any values set. Which leads us to a spec question.

What's not clear is what we should be setting the "version" column value to, right now the target isn't setting the version column data to anything. For the existing tap-postgres transferwise variant (as you can see from the records here https://github.com/MeltanoLabs/target-postgres/pull/89/files#diff-3467b68a9d59ed0029b384880f77d1d075c8768dce4abc9d39327aa92f6b47b3 ) the tap places the version in a column that's not a part of the record object itself called version. We're currently not doing anything with this column. The question really is what is the spec for activate_version? Should we follow transferwise's lead here and specify the version outside of the record object in a RECORD message?

I think meltano/meltano#2463 was supposed to address this but as meltano/meltano#2463 (comment) mentions we need to expand this spec some, and in this case we haven't addressed the issues I need to move forward here.

Next steps:

  1. Override activate_version to implement the spec that I believe to be correct in target-postgres
  2. Get some feedback from the Meltano team and pick a spec to move forward with
  3. Get a PR in to update the activate_version spec on the hub
  4. Get an issue into the SDK to update activate version

@visch
Copy link
Member Author

visch commented Jan 23, 2023

@aaronsteers Just rubberduckied this and I think I understand ignore this for a second I"ll get it updated. I think the implementation may be correct actually!

Comment is now updated!

@aaronsteers
Copy link
Contributor

aaronsteers commented Jan 23, 2023

Thanks, @visch. My understanding is that records should come through from the source with a _sdc_table_version property or similar, but I could be mistaken on that front.

https://sdk.meltano.com/en/latest/implementation/record_metadata.html

And it's nice to see that a lot of the code may already be written. What might be useful is a manual test to see how it performs. Some of that code may have been written optimistically and not ever actually tested with a proper manual or unit test. 🙃

@visch
Copy link
Member Author

visch commented Jan 23, 2023

Thanks, @visch. My understanding is that records should come through from the source with a _sdc_table_version property or similar, but I could be mistaken on that front.

https://sdk.meltano.com/en/latest/implementation/record_metadata.html

And it's nice to see that a lot of the code may already be written. What might be useful is a manual test to see how it performs. Some of that code may have been written optimistically and not ever actually tested with a proper manual or unit test. 🙃

Yeah I thought that too, just updated the comment above to be as accurate as I know right now. It looks like _sdc_table_version isn't in the record object of a RECORD message right now for transferwise variants. So we may want to implement the "correct" spec and then maybe allow for the transferwise style to work too? I'm not sure

@aaronsteers
Copy link
Contributor

It looks like _sdc_table_version isn't in the record object of a RECORD message right now for transferwise variants.

Is it perhaps in a version field which is top-level of the RECORD message?

@aaronsteers
Copy link
Contributor

aaronsteers commented Jan 23, 2023

@edgarrmondragon
Copy link
Member

@aaronsteers @visch I think we should follow the pipelinewise version of the version extension and put it at the top-level of the RECORD message, with optional support for the _sdc_table_version record-level field, and a clear deprecation timeline in case someone's already relying on this (probably nobody, really).

@visch
Copy link
Member Author

visch commented Jan 23, 2023

@edgarrmondragon and @aaronsteers thank you both!

It looks like the SDK does have support for this baked in via the metadata columns ( if add_record_metadata is set via Config) . So if metadata columns are enabled activate_version messages kind of work). I'm writing some tests now and tweaking the activate version function slightly to work with the use cases I've hit in my testing / tweaking.

Next open question I have is if add_record_metadata should be our flag to turning on/off activate version messages? And do we want a config value to turn on and off activate version messages.

I'll implement the way that's easiest right now with the way the SDK is built and put it in the PR for you all to see and we can tweak it.

@aaronsteers
Copy link
Contributor

aaronsteers commented Jan 23, 2023

@visch - Because record metadata is required for some critical operations, I'd be fine with enabling by default for our default implementation, and especially for SQLTarget subclasses. Developers should be able to redefine the setting in the case they want to disable by default.

Another developer-side customization lever would be to drive our behavior based on the presence of a capabilities array including activate-version or soft-delete/hard-delete. We could default to supporting specific capabilities, and at the same time check that the developer has not removed them from the capabilities array when deciding which code path to take.

Lots of options here and we could make it an office hours or bespoke zoom topic if helpful.

@visch
Copy link
Member Author

visch commented Jan 23, 2023

@aaronsteers / @edgarrmondragon https://github.com/MeltanoLabs/target-postgres/pull/89/files works, I have 2 more tests I thought about that seem pretty scary that I want to implement. But if you all want to take a look now that would be helpful!

@visch
Copy link
Member Author

visch commented Jan 24, 2023

@aaronsteers / @edgarrmondragon https://github.com/MeltanoLabs/target-postgres/pull/89/files works, I have 2 more tests I thought about that seem pretty scary that I want to implement. But if you all want to take a look now that would be helpful!

@aaronsteers Yes what I worried about is true we do have an issue with activate_version. I'm going to have to do some spec writing here to get this cleared up and diving.

@TyShkan
Copy link

TyShkan commented Oct 25, 2023

What is the current state of ACTIVATE_VERSION message type for taps/targets? What kind of issues are still unresolved?

@edgarrmondragon
Copy link
Member

@TyShkan in the singer SDK it's waiting to be implemented on both ends: taps don't currently emit the message, and targets in general don't process it.

@TyShkan
Copy link

TyShkan commented Oct 26, 2023

@edgarrmondragon is there any issues left on the SDK side?

We've had a conversation with @visch in Slack and I'm trying to understand if there is a blocker and where it is.

@edgarrmondragon
Copy link
Member

@edgarrmondragon is there any issues left on the SDK side?

We've had a conversation with @visch in Slack and I'm trying to understand if there is a blocker and where it is.

@TyShkan No real blockers. The relevant issue is meltano/sdk#18 and meltano/sdk#985 is a PR that went stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants