-
Notifications
You must be signed in to change notification settings - Fork 92
add --zip option to zip the submission file prior to uploading it #52
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
And there are some feedbacks to the implementation, before the merge, let's discuss and make some updates on it. Thank you.
kaggle_cli/submit.py
Outdated
@@ -25,7 +25,7 @@ def get_parser(self, prog_name): | |||
parser.add_argument('-u', '--username', help='username') | |||
parser.add_argument('-p', '--password', help='password') | |||
parser.add_argument('-z', '--zip', type=self._str2bool, nargs='?', const=True, default=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use the same argument setting as config.py
?
i.e. parser.add_argument('-z', '--zip', help='zip the submission file before uploading?', action='store_true')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
kaggle_cli/config.py
Outdated
@@ -98,7 +98,7 @@ def take_action(self, parsed_args): | |||
parsed_arg_dict = vars(parsed_args) | |||
|
|||
if DATA_OPTIONS & set( | |||
filter(lambda x: parsed_arg_dict[x], parsed_arg_dict) | |||
filter(lambda x: parsed_arg_dict[x], parsed_arg_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it an accident duplicated indent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDE issues I guess
kaggle_cli/submit.py
Outdated
@@ -35,7 +35,12 @@ def take_action(self, parsed_args): | |||
username = config.get('username', '') | |||
password = config.get('password', '') | |||
competition = config.get('competition', '') | |||
zip = config.get('zip', False) | |||
zip_flag = config.get('zip', 'no') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use getboolean method instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@floydwch but the config
object here is just a python dict, it's not a ConfigParser object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue should be resolved at the config provider level, I'll address it.
kaggle_cli/submit.py
Outdated
@@ -46,7 +51,7 @@ def take_action(self, parsed_args): | |||
entry = parsed_args.entry | |||
message = parsed_args.message | |||
|
|||
archive_name = Submit._rand_str(10)+'.zip' | |||
archive_name = Submit._rand_str(10) + '.zip' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the uploaded file's name will also show on the Kaggle's dashboard. A related name might be better. i.e. append the uuid or timestamp to the original submission file's name.
And another concern there is, for myself habit, the submission files are already suffixed a timestamp to distinguish the trial. Adding additional suffix is duplicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. What if I prepend a short random string instead?
Thanks for the feedback. I'll edit some things and remake the request |
I've applied the changes you suggested (except for using config.getboolean(), because the object is not a ConfigParser, but a normal python dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New feedbacks here. And please also update the README.md
to explain the new function.
And I'll update the master branch to provide the coerced bool field. You can rebase to the master after I updated it.
kaggle_cli/submit.py
Outdated
@@ -35,7 +35,12 @@ def take_action(self, parsed_args): | |||
username = config.get('username', '') | |||
password = config.get('password', '') | |||
competition = config.get('competition', '') | |||
zip = config.get('zip', False) | |||
zip_flag = config.get('zip', 'no') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue should be resolved at the config provider level, I'll address it.
kaggle_cli/submit.py
Outdated
zip = True | ||
else: | ||
zip = False | ||
zip = Submit._str2bool(zip_flag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding the built-in function name is discouraged. You can just apply the _str2bool
to the zip_flag = config.get('zip', 'no')
, i.e. zip_flag = Submit._str2bool(config.get('zip', 'no'))
.
But I'll address the boolean coercion issue at config provider level, you can get a boolean by config.get('zip')
after I updated it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kaggle_cli/submit.py
Outdated
archive_name = Submit._rand_str(10) + '.zip' | ||
archive_name = Submit._make_archive_name(entry) | ||
|
||
# print(archive_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incident comments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops!
kaggle_cli/submit.py
Outdated
@@ -129,29 +130,38 @@ def take_action(self, parsed_args): | |||
os.remove(target_name) | |||
|
|||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Static method is not the best choice in such case. And an underscore prefix method indicates this method is private, which means it'll access the object's private fields, but it's not the case here. I suggest you extract the method to the module scope, i.e. move the _make_archive_name
outside the Submit
class but still in the submit.py
.
kaggle_cli/submit.py
Outdated
else: | ||
archive_name = original_basename+".zip" | ||
|
||
# this is used to prevent caching issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still can't understand why it has caching issues. Can you explain the scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this but I think that when we upload two submissions (different files) with the same same (say, "mysubmission.csv"), this may trigger some form of caching in Kaggle
(They may only check the file name and, if the file name is something they have seen before, they may not even read the file).
Again, I'm not sure about this but since caching is done by many people on many websites, I though it would be a safe thing to do at little cost (just add a couple characters to the file name).
But I agree it's not essential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caching usually applied on read operation, caching a file upload would be considered a bug.
But I just tested if Kaggle caches the file? The answer is no.
I uploaded the benchmark model for the titanic competition and updated the file's content manually and uploaded again to see if it's cached. The scores returned by Kaggle were different in the two submissions. So basically, it's not an issue.
@floydwch the changes have been ready for a couple of days now... anything you want to add? |
@queirozfcom sorry, I'm busy now. I'll reply to you ASAP. |
@queirozfcom I added new feedback on the caching issue. And another concern is, please rebase the commits to the master branch for better commit logs, don't use merge. |
If option is passed, a zip file is created (using python zipfile module) with a random name (cache busting) and the zipped file is uploaded instead, to use less bandwidth. The zip file is deleted afterwards to avoid using up too much disk.