Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Short task name with a small number of variables lead to failure when dict is used in TOML Config file #2771

Closed
fullflu opened this issue Aug 20, 2019 · 2 comments
Labels

Comments

@fullflu
Copy link

fullflu commented Aug 20, 2019

Overview

I found additional errors when I reproduced the issue #2538 .
A Shorter task name with a small number of variables lead to failure (but outputs expected results).
I hope the PR #2540 would consider this error @orsinium .

Tried five ways to parse dict (key-value pairs) on TOML config.

  • Case 1 (TestCase1): Parsing with luigi.Parameter inside luigi.Task .
    -> Task fails but somehow outputs expected results. This result is the same as Luigi task does not accept dict defined in TOML Config file. #2538 .

  • Case 2 (TestCase2): Adding seven key-value pairs in the config file.
    -> Task success with warning and outputs expected results.

  • Case 3 (TC3): Using the same config file as Case 2 and the shorter task name (TC3).
    -> Task fails as with Case 1.

  • Case 4 (TestCase4): Adding six key-value pairs in the config file.
    -> Task fails as with Case 1.

  • Case 5 (TestCase5): Adding six key-value pairs in the config file with longer variable names.
    -> Task success with warning and outputs expected results.

Enrironment

  • python 3.6.1
  • luigi 2.8.8

Files

Luigi Task python script

testcase.py

import luigi


class TestCase1(luigi.Task):
    text = luigi.Parameter()
    dic = luigi.Parameter()

    def output(self):
        return luigi.LocalTarget("data/case1.txt")

    def requires(self):
        return None

    def run(self):
        with self.output().open('w') as out_file:
            out_file.write('text: {}, dict: {}'.format(self.text, self.dic))


class TestCase2(luigi.Task):
    dic_params = luigi.Parameter()
    text = luigi.Parameter()

    def output(self):
        return luigi.LocalTarget("data/case2.txt")

    def requires(self):
        return None

    def run(self):
        with self.output().open('w') as out_file:
            out_file.write('text: {}, dict: {}'.format(self.text, self.dic_params))


class TC3(luigi.Task):
    dic_params = luigi.Parameter()
    text = luigi.Parameter()

    def output(self):
        return luigi.LocalTarget("data/case3.txt")

    def requires(self):
        return None

    def run(self):
        with self.output().open('w') as out_file:
            out_file.write('text: {}, dict: {}'.format(self.text, self.dic_params))


class TestCase4(luigi.Task):
    dic_params = luigi.Parameter()
    text = luigi.Parameter()

    def output(self):
        return luigi.LocalTarget("data/case4.txt")

    def requires(self):
        return None

    def run(self):
        with self.output().open('w') as out_file:
            out_file.write('text: {}, dict: {}'.format(self.text, self.dic_params))


class TestCase5(luigi.Task):
    dic_params = luigi.Parameter()
    text = luigi.Parameter()

    def output(self):
        return luigi.LocalTarget("data/case5.txt")

    def requires(self):
        return None

    def run(self):
        with self.output().open('w') as out_file:
            out_file.write('text: {}, dict: {}'.format(self.text, self.dic_params))

TOML config file

testcase.toml

[TestCase1]
text = "sample text"

[TestCase1.dic]
key1 = 'value1'
key2 = 'value2'


[TestCase2]
text = "sample text"

[TestCase2.dic_params]
key1 = 'value1'
key2 = 'value2'
key3 = 'value3'
key4 = 'value4'
key5 = 'value5'
key6 = 'value6'
key7 = 'value7'
key8 = 'value8'
key9 = 'value9'


[TC3]
text = "sample text"

[TC3.dic_params]
key1 = 'value1'
key2 = 'value2'
key3 = 'value3'
key4 = 'value4'
key5 = 'value5'
key6 = 'value6'
key7 = 'value7'
key8 = 'value8'
key9 = 'value9'

[TestCase4]
text = "sample text"

[TestCase4.dic_params]
key1 = 'value1'
key2 = 'value2'
key3 = 'value3'
key4 = 'value4'
key5 = 'value5'
key6 = 'value6'
key7 = 'value7'
key8 = 'value8'


[TestCase5]
text = "sample text"

[TestCase5.dic_params]
key1 = 'value1'
key2 = 'value2'
key3 = 'value3'
key4 = 'value4'
key5 = 'value5'
key6 = 'value6'
key7_long_long = 'valvalval7'
key8_long_long = 'valvalval8'

Case 1

Parsing with luigi.Parameter inside luigi.Task .

PYTHONPATH='.' LUIGI_CONFIG_PARSER='toml' LUIGI_CONFIG_PATH='examples/testcase.toml' python3 -m luigi --module examples.testcase TestCase1 --local-scheduler

******:luigi ************$ PYTHONPATH='.' LUIGI_CONFIG_PARSER='toml' LUIGI_CONFIG_PATH='examples/testcase.toml' python3 -m luigi --module examples.testcase TestCase1 --local-scheduler
/Users/************/projects/luigi/luigi/parameter.py:286: UserWarning: Parameter "dic" with value "{'key1': 'value1', 'key2': 'value2'}" is not of type string.
  warnings.warn('Parameter "{}" with value "{}" is not of type string.'.format(param_name, param_value))
DEBUG: Checking if TestCase1(text=sample text, dic={'key1': 'value1', 'key2': 'value2'}) is complete
INFO: Informed scheduler that task   TestCase1___key1____value1_sample_text_7b550efa06   has status   DONE
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
INFO: Worker Worker(salt=826253823, workers=1, host=******.local, username=************, pid=53688) was stopped. Shutting down Keep-Alive thread
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
  File "/Users/************/projects/luigi/luigi/retcodes.py", line 75, in run_with_retcodes
    worker = luigi.interface._run(argv).worker
  File "/Users/************/projects/luigi/luigi/interface.py", line 211, in _run
    return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
  File "/Users/************/projects/luigi/luigi/interface.py", line 174, in _schedule_and_run
    luigi_run_result = LuigiRunResult(worker, success)
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 79, in __init__
    self.summary_text = _summary_wrap(_summary_format(summary_dict, worker))
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 414, in _summary_format
    str_output += '{0}\n'.format(_get_str(group_tasks[status], status in _PENDING_SUB_STATUSES))
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 204, in _get_str
    params = _get_set_of_params(tasks)
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 247, in _get_set_of_params
    params[param] = {getattr(task, param[0]) for task in tasks}
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 247, in <setcomp>
    params[param] = {getattr(task, param[0]) for task in tasks}
TypeError: unhashable type: 'dict'

Case 2

Adding seven key-value pairs in the config file.

******:luigi ************$ PYTHONPATH='.' LUIGI_CONFIG_PARSER='toml' LUIGI_CONFIG_PATH='examples/testcase.toml' python3 -m luigi --module examples.testcase TestCase2 --local-scheduler
/Users/************/projects/luigi/luigi/parameter.py:286: UserWarning: Parameter "dic_params" with value "{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8', 'key9': 'value9'}" is not of type string.
  warnings.warn('Parameter "{}" with value "{}" is not of type string.'.format(param_name, param_value))
DEBUG: Checking if TestCase2(dic_params={'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8', 'key9': 'value9'}, text=sample text) is complete
INFO: Informed scheduler that task   TestCase2___key1____value1_sample_text_9e00d13a2c   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 53756] Worker Worker(salt=815462199, workers=1, host=******.local, username=************, pid=53756) running   TestCase2(dic_params={'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8', 'key9': 'value9'}, text=sample text)
INFO: [pid 53756] Worker Worker(salt=815462199, workers=1, host=******.local, username=************, pid=53756) done      TestCase2(dic_params={'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8', 'key9': 'value9'}, text=sample text)
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   TestCase2___key1____value1_sample_text_9e00d13a2c   has status   DONE
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
INFO: Worker Worker(salt=815462199, workers=1, host=******.local, username=************, pid=53756) was stopped. Shutting down Keep-Alive thread
INFO: 
===== Luigi Execution Summary =====

Scheduled 1 tasks of which:
* 1 ran successfully:
    - 1 TestCase2(...)

This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====

Case 3

Using the same config file as Case 2 and the shorter task name (TC3).

******:luigi ************$ PYTHONPATH='.' LUIGI_CONFIG_PARSER='toml' LUIGI_CONFIG_PATH='examples/testcase.toml' python3 -m luigi --module examples.testcase TC3 --local-scheduler
/Users/************/projects/luigi/luigi/parameter.py:286: UserWarning: Parameter "dic_params" with value "{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8', 'key9': 'value9'}" is not of type string.
  warnings.warn('Parameter "{}" with value "{}" is not of type string.'.format(param_name, param_value))
DEBUG: Checking if TC3(dic_params={'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8', 'key9': 'value9'}, text=sample text) is complete
INFO: Informed scheduler that task   TC3___key1____value1_sample_text_9e00d13a2c   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 53822] Worker Worker(salt=216360697, workers=1, host=******.local, username=************, pid=53822) running   TC3(dic_params={'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8', 'key9': 'value9'}, text=sample text)
INFO: [pid 53822] Worker Worker(salt=216360697, workers=1, host=******.local, username=************, pid=53822) done      TC3(dic_params={'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8', 'key9': 'value9'}, text=sample text)
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   TC3___key1____value1_sample_text_9e00d13a2c   has status   DONE
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
INFO: Worker Worker(salt=216360697, workers=1, host=******.local, username=************, pid=53822) was stopped. Shutting down Keep-Alive thread
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
  File "/Users/************/projects/luigi/luigi/retcodes.py", line 75, in run_with_retcodes
    worker = luigi.interface._run(argv).worker
  File "/Users/************/projects/luigi/luigi/interface.py", line 211, in _run
    return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
  File "/Users/************/projects/luigi/luigi/interface.py", line 174, in _schedule_and_run
    luigi_run_result = LuigiRunResult(worker, success)
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 79, in __init__
    self.summary_text = _summary_wrap(_summary_format(summary_dict, worker))
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 414, in _summary_format
    str_output += '{0}\n'.format(_get_str(group_tasks[status], status in _PENDING_SUB_STATUSES))
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 204, in _get_str
    params = _get_set_of_params(tasks)
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 247, in _get_set_of_params
    params[param] = {getattr(task, param[0]) for task in tasks}
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 247, in <setcomp>
    params[param] = {getattr(task, param[0]) for task in tasks}
TypeError: unhashable type: 'dict'

Case 4

Adding six key-value pairs in the config file.

******:luigi ************$ PYTHONPATH='.' LUIGI_CONFIG_PARSER='toml' LUIGI_CONFIG_PATH='examples/testcase.toml' python3 -m luigi --module examples.testcase TestCase4 --local-scheduler
/Users/************/projects/luigi/luigi/parameter.py:286: UserWarning: Parameter "dic_params" with value "{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8'}" is not of type string.
  warnings.warn('Parameter "{}" with value "{}" is not of type string.'.format(param_name, param_value))
DEBUG: Checking if TestCase4(dic_params={'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8'}, text=sample text) is complete
INFO: Informed scheduler that task   TestCase4___key1____value1_sample_text_576e01ac43   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 53884] Worker Worker(salt=816352291, workers=1, host=******.local, username=************, pid=53884) running   TestCase4(dic_params={'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8'}, text=sample text)
INFO: [pid 53884] Worker Worker(salt=816352291, workers=1, host=******.local, username=************, pid=53884) done      TestCase4(dic_params={'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8'}, text=sample text)
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   TestCase4___key1____value1_sample_text_576e01ac43   has status   DONE
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
INFO: Worker Worker(salt=816352291, workers=1, host=******.local, username=************, pid=53884) was stopped. Shutting down Keep-Alive thread
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
  File "/Users/************/projects/luigi/luigi/retcodes.py", line 75, in run_with_retcodes
    worker = luigi.interface._run(argv).worker
  File "/Users/************/projects/luigi/luigi/interface.py", line 211, in _run
    return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
  File "/Users/************/projects/luigi/luigi/interface.py", line 174, in _schedule_and_run
    luigi_run_result = LuigiRunResult(worker, success)
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 79, in __init__
    self.summary_text = _summary_wrap(_summary_format(summary_dict, worker))
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 414, in _summary_format
    str_output += '{0}\n'.format(_get_str(group_tasks[status], status in _PENDING_SUB_STATUSES))
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 204, in _get_str
    params = _get_set_of_params(tasks)
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 247, in _get_set_of_params
    params[param] = {getattr(task, param[0]) for task in tasks}
  File "/Users/************/projects/luigi/luigi/execution_summary.py", line 247, in <setcomp>
    params[param] = {getattr(task, param[0]) for task in tasks}
TypeError: unhashable type: 'dict'

Case 5

Adding six key-value pairs in the config file with longer variable names.

******:luigi ************$ PYTHONPATH='.' LUIGI_CONFIG_PARSER='toml' LUIGI_CONFIG_PATH='examples/testcase.toml' python3 -m luigi --module examples.testcase TestCase5 --local-scheduler
/Users/************/projects/luigi/luigi/parameter.py:286: UserWarning: Parameter "dic_params" with value "{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7_long_long': 'valvalval7', 'key8_long_long': 'valvalval8'}" is not of type string.
  warnings.warn('Parameter "{}" with value "{}" is not of type string.'.format(param_name, param_value))
DEBUG: Checking if TestCase5(dic_params={'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7_long_long': 'valvalval7', 'key8_long_long': 'valvalval8'}, text=sample text) is complete
INFO: Informed scheduler that task   TestCase5___key1____value1_sample_text_5bb767f0de   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 53942] Worker Worker(salt=630670131, workers=1, host=******.local, username=************, pid=53942) running   TestCase5(dic_params={'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7_long_long': 'valvalval7', 'key8_long_long': 'valvalval8'}, text=sample text)
INFO: [pid 53942] Worker Worker(salt=630670131, workers=1, host=******.local, username=************, pid=53942) done      TestCase5(dic_params={'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7_long_long': 'valvalval7', 'key8_long_long': 'valvalval8'}, text=sample text)
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   TestCase5___key1____value1_sample_text_5bb767f0de   has status   DONE
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
INFO: Worker Worker(salt=630670131, workers=1, host=******.local, username=************, pid=53942) was stopped. Shutting down Keep-Alive thread
INFO: 
===== Luigi Execution Summary =====

Scheduled 1 tasks of which:
* 1 ran successfully:
    - 1 TestCase5(...)

This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====
@fullflu
Copy link
Author

fullflu commented Aug 20, 2019

@orsinium
FYI: I confirmed that the PR of #2540 (somehow) solved these issues.
All tasks ran successfully, but I have no idea why these phenomena occured...

@fullflu fullflu changed the title A Shorter task name with a small number of variables lead to failure when dict is used in TOML Config file A Short task name with a small number of variables lead to failure when dict is used in TOML Config file Aug 20, 2019
@stale
Copy link

stale bot commented Dec 18, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions.

@stale stale bot added the wontfix label Dec 18, 2019
@stale stale bot closed this as completed Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant