-
Notifications
You must be signed in to change notification settings - Fork 504
Description
Search before asking
- I searched in the issues and found nothing similar.
Fluss version
main
Minimal reproduce step
Currently, sendMetadataRequestAndRebuildCluster will have a timeout of 3s. After 3s, the future will be completed without updating metadata.
What doesn't meet your expectations?
In my case, a partitioned table with 512 buckets, 512 parallelism for flink sink, it'll be timeout easily and then cause sink job fail..
First, write to a partition, it will try to update the metadata in method checkAndUpdatePartitionMetadata to fetch the partition's metadata. If it timeout, the metadata won't be updated in client, and it will then throw PartitionNotExists exception althogth the Partition does exist.
For my case, a time out of 60s works....
Anything else?
I can see we need to introduce a request time out mechanism to avoid a request to hang out forever.. But for updating metadata request, it should throw Timeout exception instead of just log it to enable caller to decide retry or fail directly..
For example, when creating FlussTable, it'll try to fetch the metadata of the table in metadataUpdater.checkAndUpdateTableMetadata(Collections.singleton(tablePath)). If the metadata is timeout, the metadata can't be updated and cause it to throw table not found in cluster exception although the table does exist... At least, it should throw timeout exception instead of table not found in cluster exception which is really confused.
Are you willing to submit a PR?
- I'm willing to submit a PR!