Skip to content
This repository has been archived by the owner on Jan 27, 2024. It is now read-only.

add --zip option to zip the submission file prior to uploading it #52

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion kaggle_cli/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def take_action(self, parsed_args):
parsed_arg_dict = vars(parsed_args)

if DATA_OPTIONS & set(
filter(lambda x: parsed_arg_dict[x], parsed_arg_dict)
filter(lambda x: parsed_arg_dict[x], parsed_arg_dict)
):
if parsed_arg_dict['global']:
config_dir = os.path.join(
Expand Down
54 changes: 32 additions & 22 deletions kaggle_cli/submit.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ def get_parser(self, prog_name):
parser.add_argument('-c', '--competition', help='competition')
parser.add_argument('-u', '--username', help='username')
parser.add_argument('-p', '--password', help='password')
parser.add_argument('-z', '--zip', type=self._str2bool, nargs='?', const=True, default=False,
help='whether to zip the submission file before uploading')
parser.add_argument('-z', '--zip', help='zip the submission file before uploading?', action='store_true')

return parser

Expand All @@ -37,10 +36,7 @@ def take_action(self, parsed_args):
competition = config.get('competition', '')
zip_flag = config.get('zip', 'no')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use getboolean method instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@floydwch but the config object here is just a python dict, it's not a ConfigParser object

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue should be resolved at the config provider level, I'll address it.


if Submit._str2bool(zip_flag):
zip = True
else:
zip = False
zip = Submit._str2bool(zip_flag)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding the built-in function name is discouraged. You can just apply the _str2bool to the zip_flag = config.get('zip', 'no') , i.e. zip_flag = Submit._str2bool(config.get('zip', 'no')).

But I'll address the boolean coercion issue at config provider level, you can get a boolean by config.get('zip') after I updated it.

Copy link
Owner

@floydwch floydwch Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type coercion enhancement has been done, see 9158c36 and 551a1b2 .

You can now config.get('zip').


browser = common.login(username, password)
base = 'https://www.kaggle.com'
Expand All @@ -51,7 +47,12 @@ def take_action(self, parsed_args):
entry = parsed_args.entry
message = parsed_args.message

archive_name = Submit._rand_str(10) + '.zip'
archive_name = Submit._make_archive_name(entry)

# print(archive_name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incident comments here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops!

# print(zip)
#
# return

if zip:
with zipfile.ZipFile(archive_name, 'w', zipfile.ZIP_DEFLATED) as zf:
Expand Down Expand Up @@ -128,30 +129,39 @@ def take_action(self, parsed_args):
if zip:
os.remove(target_name)

@staticmethod
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static method is not the best choice in such case. And an underscore prefix method indicates this method is private, which means it'll access the object's private fields, but it's not the case here. I suggest you extract the method to the module scope, i.e. move the _make_archive_name outside the Submit class but still in the submit.py.

def _make_archive_name(original_file_path):
# if original name already has a suffix (csv,txt,etc), remove it
extension_pattern = r'(^.+)\.(.+)$'

# file may be in another directory
original_basename = os.path.basename(original_file_path)

if re.match(extension_pattern,original_basename):
archive_name = re.sub(extension_pattern,r'\1.zip',original_basename)
else:
archive_name = original_basename+".zip"

# this is used to prevent caching issues
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still can't understand why it has caching issues. Can you explain the scenario?

Copy link
Author

@queirozfcom queirozfcom Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this but I think that when we upload two submissions (different files) with the same same (say, "mysubmission.csv"), this may trigger some form of caching in Kaggle

(They may only check the file name and, if the file name is something they have seen before, they may not even read the file).

Again, I'm not sure about this but since caching is done by many people on many websites, I though it would be a safe thing to do at little cost (just add a couple characters to the file name).

But I agree it's not essential.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching usually applied on read operation, caching a file upload would be considered a bug.

But I just tested if Kaggle caches the file? The answer is no.

I uploaded the benchmark model for the titanic competition and updated the file's content manually and uploaded again to see if it's cached. The scores returned by Kaggle were different in the two submissions. So basically, it's not an issue.

string_prefix = uuid.uuid4().hex[:4]

prefixed_archive_name = string_prefix+"-"+archive_name

original_directory_path = os.path.dirname(original_file_path)

return os.path.join(original_directory_path,prefixed_archive_name)

@staticmethod
def _str2bool(v):
"""
parse truthy/falsy strings into booleans

https://stackoverflow.com/a/43357954/436721
:param v: the string to be parsed
:return: a boolean value
"""
if v.lower() in ('yes', 'true', 't', 'y', '1'):
if v is True or v.lower() in ('yes', 'true', 't', 'y', '1'):
return True
elif v.lower() in ('no', 'false', 'f', 'n', '0'):
elif v is False or v.lower() in ('no', 'false', 'f', 'n', '0'):
return False
else:
raise ArgumentTypeError('Boolean value expected.')

@staticmethod
def _rand_str(length):
"""
this is used to prevent caching issues

https://stackoverflow.com/a/34017605/436721

:param length: integer length
:return: a random string of the given length
"""
return uuid.uuid4().hex[:length - 1]