Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate the performance impact of replication filtering #1778

Open
balegas opened this issue Oct 1, 2024 · 0 comments
Open

Evaluate the performance impact of replication filtering #1778

balegas opened this issue Oct 1, 2024 · 0 comments

Comments

@balegas
Copy link
Contributor

balegas commented Oct 1, 2024

We're setting replication slot filtering based on shapes that are requested on Electric to prevent evaluating rows that won't match any shape. This approach doesn't seem very sustainable as the number of shapes grow. See also #1774 (comment).

We should evaluate the impact of this strategy and compare it with a filtering approach that only filters out columns (and tables) that aren't used by any shape.

We will decide on further steps once we get some numbers.

@balegas balegas added this to the Electric Scales milestone Oct 1, 2024
msfstef added a commit that referenced this issue Feb 25, 2025
… unsupported (#2367)

Fixes #2360

This will make the running Electric fallback to replicating whole tables
if it receives any shapes with unsupported where clauses (e.g. enums,
varchar with `IN` checks, user-defined data types in general, and who
knows what else).

There is no "recovery" mechanism to return to row-filtering as the
Postgres error does not allow for an easy way to check which where
clause caused the issue - once we go to relation-only filtering we stay
there, like we would if an active shape had no where clause or if we
were in PG14.

Ideally we would detect where clauses that are unsupported at the
relation filter processing level, so we can fine tune that, but until
then this fallback makes sure that Electric works even if an unsupported
where clause is provided.

As discussed in [this Discord
thread](https://discord.com/channels/933657521581858818/1341967559758581921),
we could also have a configuration flag and better errors to avoid this
sort of radical fallback, but we opted for an "always works" approach
here. IIRC some benchmarking had shown that our filtering is fast enough
that the PG level filtering might not be as important anyway, although
the limiting of transmitted data is definitely nice (despite several
issues we have with not being able to limit columns replicated etc).

This is related with
#1778 , and I'm also
referencing #1831 as we
had encountered many limitations to row filtering which has led to this
proposed change.

We can definitely improve this by detecting unsupported where clauses,
checking filter diffs to know what caused the issue and reverting back
after, periodically attempting to revert back to row-filtering, and an
array of different approaches, but this allows all where clauses to be
accepted and Electric to adjust accordingly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant