You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is a case that data throughput have significant difference between day and night. A flink job has a immutable parallelism of sinks once started. Take an un-keyed table as an example, 'none' distribution policy is select which means best utilization of all parallelism, while it creates biggest number of files even through small quality of data comes.
Therefore, I‘d like to propose a new distribution policy to dynamically allocate writing parallelism in streaming writing according to number of data. In this way, we can not only relieve the pressure from small files in hdfs, but also increate the efficiency of reading and optimisation work.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Here is a case that data throughput have significant difference between day and night. A flink job has a immutable parallelism of sinks once started. Take an un-keyed table as an example, 'none' distribution policy is select which means best utilization of all parallelism, while it creates biggest number of files even through small quality of data comes.
Therefore, I‘d like to propose a new distribution policy to dynamically allocate writing parallelism in streaming writing according to number of data. In this way, we can not only relieve the pressure from small files in hdfs, but also increate the efficiency of reading and optimisation work.
Beta Was this translation helpful? Give feedback.
All reactions