-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
delete files in advance #29
base: master
Are you sure you want to change the base?
Conversation
Why do you want to delete files in advance? Could you provide the background? |
For example, when embulk task size is changed, the number of files will be changed, so unnecessary files will be remained.
I would like to assure only transferred files are present. |
In my understanding, the number of output files is varied by input record size, and so on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I understand your intention and sounds nice.
I added several comments.
Could you also add unit tests if possible?
You don't have to mind CI failure. It is caused by encryption key settings on Travis CI.
logger.info("delete files"); | ||
Storage client = createClient(task); | ||
try { | ||
Objects listResult = client.objects().list(task.getBucket()).setDelimiter("/").setPrefix(task.getPathPrefix()).execute(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use retry with exponential backoff when plugin sends request to Google API?
https://github.com/embulk/embulk-output-gcs/blob/v0.4.4/src/main/java/org/embulk/output/GcsTransactionalFileOutput.java#L167
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks.
try { | ||
Objects listResult = client.objects().list(task.getBucket()).setDelimiter("/").setPrefix(task.getPathPrefix()).execute(); | ||
if (listResult.getItems() == null) { | ||
logger.info("no files found"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: "no files were found" or "no file was found"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
} | ||
for (StorageObject item : listResult.getItems()) { | ||
logger.info("delete file: {}/{}", item.getBucket(), item.getName()); | ||
client.objects().delete(item.getBucket(), item.getName()).execute(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to move logger.info("delete file: {}/{}", item.getBucket(), item.getName());
after sending requests.
Only logging message will be shown in case delete operation fails,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also use retry with exponential backoff?
} | ||
} | ||
catch (ConfigException | IOException ex) { | ||
throw Throwables.propagate(ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think ConfigException is thrown from any part of this method.
But it would be nice to throw ConfigException if IOException happens.
|
||
@Config("delete_in_advance") | ||
@ConfigDefault("false") | ||
Boolean getDeleteInAdvance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry..
@@ -78,6 +88,26 @@ public TransactionalFileOutput open(TaskSource taskSource, final int taskIndex) | |||
return new GcsTransactionalFileOutput(task, client, taskIndex); | |||
} | |||
|
|||
private void deleteFiles(PluginTask task) | |||
{ | |||
logger.info("delete files"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message isn't clear to me what this plugin is doing.
"Start delete files operation" or something might be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I changed this to your suggestions.
Storage client = createClient(task); | ||
try { | ||
Objects listResult = client.objects().list(task.getBucket()).setDelimiter("/").setPrefix(task.getPathPrefix()).execute(); | ||
if (listResult.getItems() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (listResult.getItems().size() == 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listResult.getItems()
returns null when no files are found.
@sakama Thanks a lot. I will check your review comments. |
71ed3ba
to
1b7e0a9
Compare
@sakama I added some test codes and fixed as you pointed out. |
@sakama Sorry for bother you, would you review my code again in your free time! |
Sorry to have kept you waiting. I'll review in few days. |
@kekekenta Sorry for leaving this pull-request for a long time. The CI settings of this plugin was not working well (especially with Travis-CI) for these months. We've gotten the CI back with GitHub Actions and a new GCS bucket configuration. It should be working now. If you still have an intention to work on it to merge, could you rebase it to the latest We had a lot of changes in v0.5.0 from v0.4.4, and going to have more echanges in v0.6.0 soon. For example, catching up with the Embulk v0.10, relocating Java packages, changing its license to the Apache License 2.0, and else. Rebasing wouldn't be very trivial. |
See https://dev.embulk.org/topics/get-ready-for-v0.11-and-v1.0.html (Ja: https://zenn.dev/dmikurube/articles/get-ready-for-embulk-v0-11-and-v1-0) for the Embulk v0.10 catch-up. |
1b7e0a9
to
fb782af
Compare
We apologise for the very late response. 🙇 |
Delete path prefix matched files in GCS bucket first if needed.