-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GroupBy not working in tests due to coder inequality #5009
Comments
This sounds familiar. Thanks for providing the test cases, will look into it ASAP |
Here are the debugging results: it should "work correctly with scio's group by" in {
runWithContext { sc =>
sc.parallelize(List(TestRecord("key1", "value1")))
.groupBy(_.key)
}
} Does not fail with the same exception:
basically, test classes are not serializable. If you move the Concerning the 1st test case it should "work correctly with beams native group by key" in {
runWithContext { sc =>
sc.parallelize(List(TestRecord("key1", "value1")))
.applyKvTransform(WithKeys.of(new SerializableFunction[TestRecord, String] {
override def apply(input: TestRecord): String = input.key
}))
.applyKvTransform(GroupByKey.create[String, TestRecord]())
}
} You are right. Since beam The workaround is to use // explicit
.applyKvTransform(GroupByKey.create[String, TestRecord]())(Coder.stringCoder, Coder.aggregate)
// implicit
implicit def groupCoder[T: Coder]: Coder[java.lang.Iterable[T]] = Coder.aggregate[T]
.applyKvTransform(GroupByKey.create[String, TestRecord]()) |
Thanks for taking a look and sorry that I mixed up that the second test case failed due to a different error. I can confirm that the workaround works fine. I suppose that this issue might appear also for other users that need to work with |
The problem is not the |
While upgrading scio from 0.12 to latest 0.13.3 and Apache Beam from 2.41 to 2.50 I observed an issue that Beam's
GroupByKey
is not working anymore in tests as it complains about the equality of coders.This is probably related to this PR apache/beam#22702 that was also mention here in this repo.
The following code shows the error:
The first two cases fail as there is a record involved.
The exception states:
From the exception the coders seem to be identical. While debugging I discovered that it is somehow related to coder materialization as here a materialized coder is compared to a non-materialized value coder:
As mentioned this only appears in tests, because when using the DirectRunner it is working as it is comparing non MaterializedCoders.
Am I doing here something wrong or is there any workaround? I already tried setting the coder's manually through
setCoder
.Thanks for taking a look at this issue.
The text was updated successfully, but these errors were encountered: