Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global parameters can be overridden when instantiating a task #423

Closed
wants to merge 1 commit into from

Conversation

erikbern
Copy link
Contributor

@erikbern erikbern commented Sep 2, 2014

Also removed a couple of ways global parameters are special. The only feature separating global variables is now that they can be set on the command line for all tasks. But other than that, global parameters behaves like any other parameter.

There was also some issues with leaky state from one test to another - interface_test.py was leaking a mocked no_lock=True parameter which parameter_test.py was relying on. Removed the leak and added --no-lock everywhere in parameter_test.py

@erikbern erikbern force-pushed the erikbern/global-params-override branch 2 times, most recently from f6ecf52 to cbb4991 Compare September 2, 2014 16:48
Also removed a couple of ways global parameters are special. The only feature separating global variables is now that they can be set on the command line for all tasks. But other than that, global parameters behaves like any other parameter.

There was also some issues with leaky state from one test to another - interface_test.py was leaking a mocked no_lock=True parameter which parameter_test.py was relying on. Removed the leak and added --no-lock everywhere in parameter_test.py
@erikbern erikbern force-pushed the erikbern/global-params-override branch from cbb4991 to c419e7f Compare September 2, 2014 17:57
@Tarrasch
Copy link
Contributor

Tarrasch commented Sep 2, 2014

@sisidra, is this related to our issues with parameters the other month? --scrub_pool vs --pool

@sisidra
Copy link
Contributor

sisidra commented Sep 3, 2014

@Tarrasch
I really hate global parameter implementation and notion itself. As @freider said, it's more like configuration.
Also, problem is that you can not have global parameter with same name in two different jobs. That is case for Hadoop task and Crunch task with --pool. It is defined in Hadoop task, so I can not use or define it in Crunch task (without hacks or ugly workaround code). I don't think this patch solves it.

@erikbern
Copy link
Contributor Author

erikbern commented Sep 3, 2014

Regarding the issue with pool – can you make Hadoop tasks and Crunch tasks share the parameter? Eg.

class TaskA(Task):
    p = Parameter(is_global=True)

class TaskB(Task):
    p = TaskA.p

@sisidra
Copy link
Contributor

sisidra commented Sep 3, 2014

Yes but I see it as hack and smell.
Referencing BaseHadoopJobTask.pool from our Crunch Task is not ok as they have no other common parts.
Better would be refactoring out BaseHadoopJobTask.pool into some PoolContainer (same as EnvironmentParamsContainer), but using Task to hold and reference parameters is smell in design anyway.
imo. :)

@erikbern
Copy link
Contributor Author

erikbern commented Sep 3, 2014

I'm not sure I agree. Given that you want to use global parameters (which I'm not convinced about) – how would you otherwise support two classes with the same parameter?

  1. Make sharing implicit by name
  2. Make sharing explicit by using the same Parameter object
  3. Inheritance

I think #2 makes a lot more sense, and that was part of the original design.

Did you refer to the use of parameters on tasks at all? Can you elaborate why you think it's "smell"?

@erikbern
Copy link
Contributor Author

erikbern commented Sep 3, 2014

Btw – I think the discussion above is pretty unrelated to this PR, so please let me know if you don't think I should merge this, or I'll merge it in a few days

@sisidra
Copy link
Contributor

sisidra commented Sep 4, 2014

-2
It will break existing code:

class MyTask(luigi.hadoop.JobTask):
  date = luigi.DateParameter()

MyTask(some_date) # luigi.parameter.MissingParameterException

I would be fine with implementation that adds global params overrides only using kwargs.

Discussion about global params in general is valid also.
Agree, 1# is not ok.
But 2# is even more not ok, referencing unrelated class (JobTask from CrunchTask) is exactly what @DAVW said about "do not tie together different modules"!
3# is just impossible.

4# is using parameter holder same as EnvironmentParamsContainer (and we probably will refactor to do that)

class ResourceManagerParams(Task):
  pool = luigi.Parameter()

Problem with this is that in my mind parameter container is not a Task. It is semantically wrong.
As I see no good solution, I conclude that global parameters design is flawed and I am not a fan of integrating them even deeper into Task.
But I can live with 4# given patch does not break existing code and global parameters are not positional but only kw.

@erikbern
Copy link
Contributor Author

erikbern commented Sep 4, 2014

Thanks for catching the issue with positional global params, I'll see what I can do about it.

I see what you are saying about the Task-Parameter integration. I don't think it would be super hard to relax the condition that every parameter has to live on a Task – would be fine if global parameters live in the global scope. There's a separate discussion whether we want global parameters at all, of course. I think they're bad for the same reason as they are bad in most object oriented languages

@erikbern erikbern closed this Sep 4, 2014
@erikbern erikbern deleted the erikbern/global-params-override branch January 21, 2015 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants