Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added max_s3_upload_threads_per_task option to Redshift #334

Conversation

t3t5u
Copy link
Contributor

@t3t5u t3t5u commented Jan 17, 2024

The changes are essentially the same as in this PR.
I've fixed the points commented on in the above review.

@t3t5u t3t5u requested a review from a team as a code owner January 17, 2024 01:37
@t3t5u
Copy link
Contributor Author

t3t5u commented Jan 17, 2024

@dmikurube @hiroyuki-sato
Please review. 🙇‍♂️

@t3t5u t3t5u force-pushed the add_max_s3_upload_threads_per_task_for_redshift branch from 24cd4e4 to 9121b60 Compare January 17, 2024 01:45
Copy link
Member

@hiroyuki-sato hiroyuki-sato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to know why this PR changed some part (i.e default value) from the original.
I don't have any knowledge which is better yet.

@@ -83,6 +83,10 @@ public interface RedshiftPluginTask extends AwsCredentialsTaskWithPrefix, Plugin
@ConfigDefault("true")
public boolean getDeleteS3TempFile();

@Config("max_s3_upload_threads_per_task")
@ConfigDefault("null")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of the original PR is 5. Why you change this value?
https://github.com/embulk/embulk-output-jdbc/pull/311/files#diff-3ad638974347d49a3df0afc40dc2fa00382b91105aa42eeec25468daccbfc434R54

Any update after this comment in trocco servcie?
#311

As newCachedThreadPool would create new thread for execution if and only if current thread are all occupied, it seems tenable to pass a maximum of thread to newCachedThreadPool. However, due to the execution mechanism of embulk is ansync we may have troubles in how to determine the max of it. The most simple way to resolve is just make the maximum of each thread pool as a argument. After some performance test, it seems fine to make the default value of thread_maximum as 5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've kept the original behavior by default, due to the requested in the following comment.

#311 (review)

It changes the thread behavior even if thread_maximum is not specified.
newCachedThreadPool() and newFixedThreadPool(5) behave different.
Can you make the it Optional<Interger>, and keep the behavior when the option is not specified?

Any update after this comment in trocco servcie?

No, the maximum number of threads remains 5 in the trocco.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Thanks. Do you think whether is is better to change 5 by default in future release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a problem with the change, but I don't think it needs to be changed.
I think users would change the configuration as needed.

this.executorService = Executors.newCachedThreadPool();
this.executorService = maxS3UploadThreadsPerTask != null
? Executors.newFixedThreadPool(maxS3UploadThreadsPerTask)
: Executors.newCachedThreadPool();
this.uploadAndCopyFutures = new ArrayList<Future<Void>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original PR change thread pool from newThreadPool to Executors.newFixedThreadPool(threadMaximum).
Why you change this part from the origianl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the same reason as above, if max_s3_upload_threads_per_task is not specified, continue to use newCachedThreadPool to keep the original behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Thanks

@dmikurube dmikurube requested a review from a team January 17, 2024 07:55
Copy link
Member

@dmikurube dmikurube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change basically looks good to me, as it keeps the original behavior, and it follows the feedbacks in the previous pull request.

Two points :

  1. Wanted to hear other comments from @embulk/jdbc-maintainers
  2. If some more change requests are expected to come from trocco, we may want to consider re-organizing the maintainer team and the maintenance policies. We now have some discussions on Zulip with @kekekenta, and let's keep going on.

@dmikurube dmikurube requested a review from a team January 17, 2024 08:03
Copy link
Member

@hiroyuki-sato hiroyuki-sato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, @t3t5u Thank you for your comment.

What is the relationship #311 and this PR?
IIUC, Both PRs propose from trocco team.
I tought This PR was proposed for replace #311. Correct?

@@ -83,6 +83,10 @@ public interface RedshiftPluginTask extends AwsCredentialsTaskWithPrefix, Plugin
@ConfigDefault("true")
public boolean getDeleteS3TempFile();

@Config("max_s3_upload_threads_per_task")
@ConfigDefault("null")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Thanks. Do you think whether is is better to change 5 by default in future release?

this.executorService = Executors.newCachedThreadPool();
this.executorService = maxS3UploadThreadsPerTask != null
? Executors.newFixedThreadPool(maxS3UploadThreadsPerTask)
: Executors.newCachedThreadPool();
this.uploadAndCopyFutures = new ArrayList<Future<Void>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Thanks

Copy link
Member

@hiroyuki-sato hiroyuki-sato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, This PR LGTM. If it is better to change max_s3_upload_threads_per_task default value, let's change default in future release. For preparation, adding log message this value will change in future release... is one of choice.

otherwise, Let's keep current behavior.

@t3t5u
Copy link
Contributor Author

t3t5u commented Jan 17, 2024

@hiroyuki-sato

I tought This PR was proposed for replace #311. Correct?

That is correct. This PR is a reworked replacement for #311.

otherwise, Let's keep current behavior.

Thank you for the advice.
I think the default behavior doesn't need to be changed at this time.

@hiroyuki-sato
Copy link
Member

@t3t5u Thanks! LGTM👍

@dmikurube dmikurube added this to the v0.10.5 milestone Feb 16, 2024
@dmikurube
Copy link
Member

Thanks for your contribution. Merging this for v0.10.5...

@dmikurube dmikurube merged commit feae8a0 into embulk:master Feb 22, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants