Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mixed][Improvement]:Support for serialization and deserialization thread-safe in lookup JOIN #2497

Closed
3 tasks done
Tracked by #2176
YesOrNo828 opened this issue Jan 8, 2024 · 0 comments · Fixed by #2498
Closed
3 tasks done
Tracked by #2176

Comments

@YesOrNo828
Copy link
Contributor

YesOrNo828 commented Jan 8, 2024

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

Serialize and deserialize operations are not thread-safe in the BinaryRowDataSerializerWrapper.

Using the reuseRow and reuseWriter to serialize the RowData could speed the serialization process up, but it may lead to serializing incorrectly in multi-threads.

Related logs:

Caused by: java.lang.IndexOutOfBoundsException: offset: 0, length: 64, size: 48
	at org.apache.flink.core.memory.DataOutputSerializer.write(DataOutputSerializer.java:134) ~[plugin_ne-flink-1.14.0-1.0.13_scala2.12_hive2.1.1_arctic0.5.1.4-release-3.9.16-1.4.11.0.1.jar:?]
	at org.apache.flink.table.runtime.typeutils.BinaryRowDataSerializer.serializeWithoutLength(BinaryRowDataSerializer.java:146) ~[flink-table_2.12-ne-flink-1.14.0-1.0.13.jar:ne-flink-1.14.0-1.0.13]
	at org.apache.flink.table.runtime.typeutils.BinaryRowDataSerializer.serialize(BinaryRowDataSerializer.java:88) ~[flink-table_2.12-ne-flink-1.14.0-1.0.13.jar:ne-flink-1.14.0-1.0.13]
	at com.netease.arctic.flink.lookup.BinaryRowDataSerializerWrapper.serialize(BinaryRowDataSerializerWrapper.java:61) ~[plugin_ne-flink-1.14.0-1.0.13_scala2.12_hive2.1.1_arctic0.5.1.4-release-3.9.16-1.4.11.0.1.jar:?]
	at com.netease.arctic.flink.lookup.RocksDBCacheState.serializeKey(RocksDBCacheState.java:144) ~[plugin_ne-flink-1.14.0-1.0.13_scala2.12_hive2.1.1_arctic0.5.1.4-release-3.9.16-1.4.11.0.1.jar:?]
	at com.netease.arctic.flink.lookup.RocksDBSetSpilledState.serializeKey(RocksDBSetSpilledState.java:78) ~[plugin_ne-flink-1.14.0-1.0.13_scala2.12_hive2.1.1_arctic0.5.1.4-release-3.9.16-1.4.11.0.1.jar:?]
	at com.netease.arctic.flink.lookup.RocksDBSetSpilledState.get(RocksDBSetSpilledState.java:114) ~[plugin_ne-flink-1.14.0-1.0.13_scala2.12_hive2.1.1_arctic0.5.1.4-release-3.9.16-1.4.11.0.1.jar:?]
	at com.netease.arctic.flink.lookup.SecondaryIndexTable.get(SecondaryIndexTable.java:85) ~[plugin_ne-flink-1.14.0-1.0.13_scala2.12_hive2.1.1_arctic0.5.1.4-release-3.9.16-1.4.11.0.1.jar:?]
	at com.netease.arctic.flink.lookup.BasicLookupFunction.lookup(BasicLookupFunction.java:189) ~[plugin_ne-flink-1.14.0-1.0.13_scala2.12_hive2.1.1_arctic0.5.1.4-release-3.9.16-1.4.11.0.1.jar:?]
	... 78 more

How should we improve?

ThreadLocal provides an easy-to-use API to confine some values to each thread. This is a reasonable way of achieving thread safety in Java.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

@YesOrNo828 YesOrNo828 changed the title [Improvement]:Support for serialization and deserialization thread-safe in lookup join [Mixed][Improvement]:Support for serialization and deserialization thread-safe in lookup JOIN Jan 8, 2024
@zhoujinsong zhoujinsong mentioned this issue Jun 25, 2024
66 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant