Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: Postgres sink issues #16745

Open
6 tasks
StrikeW opened this issue May 14, 2024 · 7 comments
Open
6 tasks

Tracking: Postgres sink issues #16745

StrikeW opened this issue May 14, 2024 · 7 comments
Assignees
Labels
help wanted Issues that need help from contributors type/feature

Comments

@StrikeW
Copy link
Contributor

StrikeW commented May 14, 2024

@github-actions github-actions bot added this to the release-1.10 milestone May 14, 2024
@fuyufjh
Copy link
Member

fuyufjh commented May 14, 2024

I am thinking about rewriting the Postgres sink with Rust tokio_postgres and leverage the ScalarAdapter

+1 for migrating from JDBC to native implementation. My intuitive is that JDBC adds another abstraction layer and makes it more difficult for us to make all data type conversion consistent among source/sink systems. Let alone the performance loss.

@xiangjinwu
Copy link
Contributor

  • Agree to implement the sink in rust and avoid going thru JDBC.

  • I find that it might be difficult to support this via JDBC interface, since java.sql.Types doesn't have a type corresponding to UUID array, so we cannot get the uuid array type from the prepared statement.

    It is doable. The following snippet outputs _uuid, which means array of uuid in PostgreSQL (just like _float8 is double precision[]).

      PreparedStatement stp = conn.prepareStatement("select array['018f761c-a819-748a-863f-52204f67aa2a'::uuid] = ?;"
      System.out.println(stp.getParameterMetaData().getParameterTypeName(1));
    

    Just sharing it is possible. But for a better future I am also in favor of the rust implementation.

@StrikeW StrikeW added the good first issue Good for newcomers label May 15, 2024
@fuyufjh fuyufjh added help wanted Issues that need help from contributors and removed good first issue Good for newcomers labels May 16, 2024
@fuyufjh
Copy link
Member

fuyufjh commented Jun 13, 2024

Today we met a stability issue in PostgreSQL sink, causing the barriers stuck forever.

[Actor 445607]
Actor 445607: `<redacted>` [972.924s]
  Epoch 6619587039723520 [!!! 937.694s]
    Sink 6CCA700000002 [!!! 924.654s]
      Consume Log: sink_id: 46746 actor_id: 445607, executor_id: 1913867491868674 [!!! 972.924s]
        Wait Response Stream [!!! 937.694s]
        Wait Next Item: 6619587039723520 [!!! 923.734s]

Here the Wait Response Stream indicates the barrier was not passed back from Java connector.

Even worse, there are no logs telling what's happening in Java side. No helpful logs found.

@StrikeW
Copy link
Contributor Author

StrikeW commented Jun 13, 2024

Today we met a stability issue in PostgreSQL sink, causing the barriers stuck forever.

[Actor 445607]
Actor 445607: `<redacted>` [972.924s]
  Epoch 6619587039723520 [!!! 937.694s]
    Sink 6CCA700000002 [!!! 924.654s]
      Consume Log: sink_id: 46746 actor_id: 445607, executor_id: 1913867491868674 [!!! 972.924s]
        Wait Response Stream [!!! 937.694s]
        Wait Next Item: 6619587039723520 [!!! 923.734s]

Here the Wait Response Stream indicates the barrier was not passed back from Java connector.

Even worse, there are no logs telling what's happening in Java side. No helpful logs found.

I think we should prioritize #17095 to enable sink decouple by default. After that, issues happen in the Sink would not affect the whole cluster.

@fuyufjh
Copy link
Member

fuyufjh commented Jun 13, 2024

I think we should prioritize #17095 to enable sink decouple by default. After that, issues happen in the Sink would not affect the whole cluster.

They are independent problems, I think. Even with sink_decouple=true, if this unknown bug recurs causing the JDBC connector to hang forever, the streaming job will eventually be stuck when log store become full.

@fuyufjh fuyufjh modified the milestones: release-2.0, release-2.1 Aug 19, 2024
@StrikeW StrikeW removed this from the release-2.1 milestone Oct 17, 2024
@kwannoel kwannoel self-assigned this Oct 30, 2024
@lmatz
Copy link
Contributor

lmatz commented Nov 3, 2024

One issue mentioned by a user lately is that if different sinks point to the same destination, PG in this case, can they share the same connection (pool)?

@StrikeW
Copy link
Contributor Author

StrikeW commented Nov 4, 2024

One issue mentioned by a user lately is that if different sinks point to the same destination, PG in this case, can they share the same connection (pool)?

As I mentioned in the Notion, in current schedule policy it is hard to share connection for sinks. And embed a connection pool in CN will make it become stateful which may introduce limitation to scaling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues that need help from contributors type/feature
Projects
None yet
Development

No branches or pull requests

5 participants