Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Direct Runner doesn't use coder registered in registry? #29908

Closed
1 of 16 tasks
hjtran opened this issue Jan 3, 2024 · 4 comments · Fixed by #33932
Closed
1 of 16 tasks

[Bug]: Direct Runner doesn't use coder registered in registry? #29908

hjtran opened this issue Jan 3, 2024 · 4 comments · Fixed by #33932

Comments

@hjtran
Copy link
Contributor

hjtran commented Jan 3, 2024

What happened?

I'm trying to write a coder for an unpicklable object, but when I register it with the coder registry, the direct runner seems to want to try to pickle it anyways. I've created an example in beam playground

Not sure if I'm just missing something trivial here

Issue Priority

Priority: 3 (minor)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@hjtran
Copy link
Contributor Author

hjtran commented Jan 3, 2024

Might this be related to #18490

@hjtran
Copy link
Contributor Author

hjtran commented Jan 4, 2024

#18490 was a red herring. The issue isn't exactly with the python direct runner either. I think the issue is that apache_beam.transforms.util.ReshufflePerKey uses type hints Any and Any data use the picklecoder rather than any specially specified coder in the coder registry.

@tvalentyn
Copy link
Contributor

tvalentyn commented Jan 29, 2024

have you tried setting with_output_types / with_input_types explicitly after create or on reshuffle ?

@hjtran
Copy link
Contributor Author

hjtran commented Jan 29, 2024

Yes, that indeed works. I think the issue is more that when this happens, it's difficult to identify why, especially if you think that the registry coder will get respected all the time.

I have a limited fix that I haven't gotten around to posting yet that narrows the type definitions in ReshufflePerKey for global windows. This fixes some part of the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants