Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visible parameter for luigi.Parameters #2278

Merged
merged 26 commits into from
Aug 8, 2018

Conversation

nryanov
Copy link
Contributor

@nryanov nryanov commented Nov 7, 2017

Description

Visible parameter for luigi.Parameter class which is needed to show/hide some parameters from WEB page such as passwords to database

Motivation and Context

Well, in my case this change is crucial because of some of my tasks connect to database and contain passwords for it which i don't want to show it from web page to all.

In code it will look like this:

class MyTask(luigi.Task):
    secret = luigi.Parameter(visible=False)    # default value for visible - True
    public = luigi.Parameter()

Have you tested this? If so, how?

I tested it on my tasks and it works fine. It just removes this parameters from node represents current task, so it shouldn't break any compatibility

(Below images are just examples)
Before:
image
After:
image

Maybe, it will be helpful for others too in similar case

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to challenge me and explain your train of thought. Thanks!

luigi/worker.py Outdated
@@ -775,7 +775,7 @@ def _add(self, task, is_complete):
runnable=runnable,
priority=task.priority,
resources=task.process_resources(),
params=task.to_str_params(),
params=task.to_str_params(only_visible=True),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand here why you set the default to false, but force the worker to pass true. I feel that whatever the result may be, it shouldn't affect people who don't have this value set on their Parameters.

Thus if visible is set to true, then only_visible should be true by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlstadther Thank you for your response!

I feel that whatever the result may be, it shouldn't affect people who don't have this value set on their Parameters.

It will not affect such people because in luigi.Parameter's constructor visible value by default is True

I'm not sure I understand here why you set the default to false, but force the worker to pass true

I set the default value to false, because method to_str_params is called from other places and in such cases it expects to recieve all parameters from task regardless of values like significant or visible.

For example,

    def _announce_scheduling_failure(self, task, expl):
            self._scheduler.announce_scheduling_failure(
                worker=self._id,
                task_name=str(task),
                family=task.task_family,
                params=task.to_str_params(only_significant=True), # regadless of visible, will return only significant params
                expl=expl,
                owners=task._owner_list(),
            )
     self._scheduler.add_task(
                worker=self._id,
                task_id=task.task_id,
                module=get_work_response.get('task_module'),
                family=get_work_response['task_family'],
                params=task.to_str_params(), # as before return all params 
                status=RUNNING,
                batch_id=get_work_response['batch_id'],
            )

Thus if visible is set to true, then only_visible should be true by default.

In general, i tried to copy significant logic but for another purpose and visible is similar to it

def __init__(self, default=_no_value, is_global=False, significant=True, description=None,
                 config_path=None, positional=True, always_in_help=False, batch_method=None):
    def __init__(self, default=_no_value, is_global=False, significant=True, description=None,
                 config_path=None, positional=True, always_in_help=False, batch_method=None, visible=True):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then, if i understand this correctly... this _add() method is what the worker uses to push the definition of each task to the web view? So by removing non-visible params from the task's tostr, you remove them from visibility?

Are there any negative side-effects if i have two nearly identical tasks which only differ in username/password? They would have the same result from to_str_params() when they are not the same task.

Copy link
Contributor Author

@nryanov nryanov Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then, if i understand this correctly... this _add() method is what the worker uses to push the definition of each task to the web view? So by removing non-visible params from the task's tostr, you remove them from visibility?

Exactly.
In my case, the main purpose was to hide password, username, host and port only from web view in tasks like that:

class SomeAction(luigi.Task):
    password = luigi.Parameter(visible=False)
    username= luigi.Parameter(visible=False)
    host= luigi.Parameter(visible=False)
    port= luigi.Parameter(visible=False)
    id = luigi.IntParameter()
    date = luigi.DateParameter()

    def output(self):
        return luigi.PostgresTarget(...)

    def requires(self):
        return [OtherTasks(self.password, self.username, self.port, self.host)]
...

In the result, in web page visible only ID and Date. I tested it on our dev place with different parameters and it works fine for me.
Also, for additional testing i pass to luigi.interface.run my own worker_scheduler_factory to produce custom scheduler\workers and i place some logs into it to control task params

Are there any negative side-effects if i have two nearly identical tasks which only differ in username/password? They would have the same result from to_str_params() when they are not the same task.

Probably, i don't correctly understand what do you mean by They would have the same result from to_str_params() when they are not the same task. For such tasks, to_str_params would have some affect only for params where visible=False and only for web page, not for the task instance

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I think i understand.

luigi/task.py Outdated
@@ -469,14 +469,14 @@ def from_str_params(cls, params_str):

return cls(**kwargs)

def to_str_params(self, only_significant=False):
def to_str_params(self, only_significant=False, only_visible=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[see worker comment]

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this lgtm.

Could you @ mention one or two others for review?

@nryanov
Copy link
Contributor Author

nryanov commented Nov 9, 2017

Thank you for your approval @dlstadther !
Yes, I can mention other reviewers, but I don't know many of them
@Tarrasch , @prncc

Sorry, If I mentioned someone incorrectly

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good start on solving this everlasting issue. I like the idea of introducing a new option instead of changing significant


However this PR don't have any tests nor docs yet. :)

@nryanov
Copy link
Contributor Author

nryanov commented Nov 12, 2017

@Tarrasch , If this is the only problem at this moment i will add needed tests and docs as soon as possible =)

@nryanov
Copy link
Contributor Author

nryanov commented Nov 14, 2017

@Tarrasch , i've added docs and tests for visible param.
In tests i tried to show that visible param doesn't affect on core process of resolving params and that it also works independently of significant.

@Tarrasch
Copy link
Contributor

Looks good so far. Hmm... I didn't imagine it would be "this simple" solving this so long overdue issue. Maybe let this PR be open for a week or so and let people chime in?

Great job so far. :)

@nryanov
Copy link
Contributor Author

nryanov commented Nov 15, 2017

Thank you!

Maybe let this PR be open for a week or so and let people chime in?

Ok =)

@TMaYaD
Copy link
Contributor

TMaYaD commented Nov 27, 2017

I'm thinking out aloud here: I think we should consider adding multiple levels of privacy.
I can think of quite a few use cases from different discussions in this repo.

  1. significant vs non-significant: This effects the Task identity and everyone understands this well.
  2. Visible Vs Invisible: For this I'm thinking purely in display terms. Oftentimes I have so many parameters(mostly with default values), that clutter up my dashboard that I wish to hide them from dashboard.
  3. Private vs Public: These are the security related concerns like usernames, passwords, API Tokens etc. that we don't want to leak beyond the workers. Esp considering scheduler does not have any auth-auth built in.

It's easy to generalise one category into another. Ex. At first glance one would think invisible params are always insignificant. But I have one use case in our codebase which isn't so. Similarly any attempt to generalise the categories might result in long arguments. But one generalisation that does make sense(To me; Correct me if I'm wrong) is private params being invisible since they are not even available to the scheduler to display or hide. Consequently, it might be a good Idea to add a third state to visible.

@nryanov
Copy link
Contributor Author

nryanov commented Nov 29, 2017

Visible Vs Invisible: For this I'm thinking purely in display terms. Oftentimes I have so many parameters(mostly with default values), that clutter up my dashboard that I wish to hide them from dashboard.

But do we need to store invisible params into history and\or allow to access them from remote scheduler?

Private vs Public: These are the security related concerns like usernames, passwords, API Tokens etc. that we don't want to leak beyond the workers. Esp considering scheduler does not have any auth-auth built in.

private params being invisible since they are not even available to the scheduler to display or hide

At this moment, params with visible=False are like private params as they are not accessible from within remote scheduler. If visible=True (by default), then params will act like public params in your example.

Consequently, it might be a good Idea to add a third state to visible

I can imagine these states or similar to them:
public - accessible for anyone from anywhere
protected - accessible for remote scheduler but not visible from web view
private - accessible only within worker

@TMaYaD
Copy link
Contributor

TMaYaD commented Nov 29, 2017

But do we need to store invisible params into history and\or allow to access them from remote scheduler?

Yes. We do have some such params in our tasks where machine cares about the params but humans don't.

  • One example is a param called number_of_chunks which is used to split a task into smaller chunks in map -> reduce scenario. The param is redundant because it's visible from the DAG anyway.
  • Another example has to do with proxying params. There is an intermediary step that doesn't use the params itself but proxies them from downstream to upsteam. Again, we don't need to see most of those params on the task but they are crucial for proper functioning.

I can imagine these states or similar to them:
public - accessible for anyone from anywhere
protected - accessible for remote scheduler but not visible from web view
private - accessible only within worker

Exactly what I had in mind. However I would rename protected as hidden to reflect the intent better.

@nryanov
Copy link
Contributor Author

nryanov commented Nov 29, 2017

Well, i've added some fixes for hidden params functionality.
It's only a sketch by now.

I didn't resolve the problem with tuple like (param_value, visibility_type).
Also i want to add something like enum (or class like Event now) for these visibility types:

class VisibilityModifier(object):
    public = ...
    private = ...
    hidden = ...

For now, it works like described above:

public - accessible for anyone from anywhere
hidden - accessible for remote scheduler but not visible from web view
private - accessible only within worker

And also, hidden params are recorded into task_history_params.

Later, i'll add unit tests for it


Probably, _serialize_task isn't the best place where we should filter hidden params for web view.

Maybe add additional param inside Task class (in scheduler.py) like public_params to avoid extra overhead on filtering hidden params in this function?

@Tarrasch
Copy link
Contributor

Tarrasch commented Dec 9, 2017

@TMaYaD , @Gr1f0n6x, Thank you two so much for thinking this through! I really like that you have this discussion and I like the direction it's going!

@nryanov
Copy link
Contributor Author

nryanov commented Dec 17, 2017

Finally, i came up with (i believe not bad ^_^) solution.
The general idea is to send to the central scheduler not pure parameter values, but with their visible modifiers. Because of that i've created additional method to_str_params_with_visibility which returns dictionary with tuples (value, modifier).
This method is used only in one place: Worker._add(...).
Also, to reduce overhead on filtering public parameters, i've added two fields in Task class in scheduler.py: public_params and hidden_params. Probably, hidden_params in this case is unnecessary, but maybe for some cases can be useful in future.
By default, parameters with visible=2 are private and aren't sent to the central scheduler.

I tried to not modify old code to not break the compatibility with existing code.

I've tested this in two ways:

  1. The common unit tests in visible_parameters_tests.py
    image

  2. I ran some tasks:

import luigi
from luigi.contrib import postgres
import time


class Dep(luigi.Task):
    param1 = luigi.Parameter(default='public_dep')
    param2 = luigi.Parameter(default='private_dep', visible=2)
    param3 = luigi.Parameter(default='hidden_dep', visible=1)
    param4 = luigi.Parameter(default='private_dep', visible=2)
    param5 = luigi.Parameter(default='hidden_dep', visible=1)
    time = luigi.Parameter(default="{}".format(time.time()))

    def output(self):
        return postgres.PostgresTarget(host="localhost", password="", database="postgres",
                                       user="postgres", table="dep",
                                       update_id="dep_{}")

    def requires(self):
        return []

    def run(self):
        print("Run dep")
        self.output().touch()


class Dep2(luigi.Task):
    param1 = luigi.Parameter(default='public_dep2')
    param2 = luigi.Parameter(default='private_dep2', visible=2)
    param3 = luigi.Parameter(default='hidden_dep2', visible=1)
    param4 = luigi.Parameter(default='private_dep2', visible=2)
    param5 = luigi.Parameter(default='public_dep2')

    def output(self):
        return postgres.PostgresTarget(host="localhost", password="", database="postgres",
                                       user="postgres", table="dep2",
                                       update_id="dep2_{}")

    def requires(self):
        return []

    def run(self):
        print("Run dep 2")
        self.output().touch()


class Test(luigi.Task):
    public = luigi.Parameter(default='public')
    hidden = luigi.Parameter(default='hidden', visible=1)
    private = luigi.Parameter(default='private', visible=2)

    def output(self):
        return postgres.PostgresTarget(host="localhost", password="", database="postgres",
                                       user="postgres", table="test",
                                       update_id="test_{}".format(time.time()))

    def requires(self):
        return [Dep()]

    def run(self):
        yield Dep2()
        print("Test task")
        self.output().touch()


if __name__ == '__main__':
    luigi.interface.run(main_task_cls=Test)

Result:
image

In my database:
image


The one thing i still can't resolve it is the constants for visible modifiers.
The current solution is simply put 0,1 or 2 as visible value, but, as i think, it is not clear and, maybe, it would be better if some Enum or something like that will exist with human-readable values


Many tests failed =/
I'll try to figure out why and fix it

@nryanov nryanov closed this Dec 17, 2017
@nryanov nryanov reopened this Dec 17, 2017
@Tarrasch
Copy link
Contributor

Yes. I think using an enum is a sound abstraction for this. :)

@thisiscab
Copy link
Contributor

Hi, @Gr1f0n6x I would be interested in having something like such work due to how we're using Luigi it would be beneficial for us to hide some parameters from the admin interface for convenience.

Did you happen to make any progress on this outside of this repo? Is this still something that is on your mind? What are the things that would left in order to consider this feature complete?

Thanks!

@nryanov
Copy link
Contributor Author

nryanov commented Jun 10, 2018

Hi @cabouffard ,

What are the things that would left in order to consider this feature complete?

The main problem now is to add 3rd state (hidden, in this case) for parameters visibility, because some of my solutions change core logic and it break some tests.

If you want just fully hide some parameters and show others - you can use first commit for it (we still use this solution in prod).

Also, i have a vacation now, so, i can continue to work on it and i believe, that i'll finish soon this feature =)

@thisiscab
Copy link
Contributor

@Gr1f0n6x Thanks for the feedback! We're eager to use that feature on our pipeline! :)

@Tarrasch
Copy link
Contributor

Still want to say I'm still looking forward to this being implemented. :)

@nryanov
Copy link
Contributor Author

nryanov commented Jul 30, 2018

@Tarrasch i've fixed one of the failed tests. The problem was in IntEnum serialization.
But i have no idea why doctest always fails. I'm not sure, but i think that it is concerned with IntEnum class, because first time this test failed when i replace simple enum-like class by IntEnum (maybe it's just coincidence).
I will try to reproduce it locally and fix it

@dlstadther
Copy link
Collaborator

@Gr1f0n6x Docs are failing for you due to your import enum. I've ran into issues before with tests where a downstream import caused such an error.

Basically enum is Python 3 only

I mention it here when it happened for another PR.

@nryanov
Copy link
Contributor Author

nryanov commented Jul 31, 2018

@dlstadther , @Tarrasch would it be ok if i replace enum by simple enum-like class to solve this issue? Or there is another way to solve it?

@dlstadther
Copy link
Collaborator

You can use the enum class that's built into python. (there's a 2.7 backport we're already using)

Just looked through this PR history and found @Tarrasch mentioned we're using a backport for enum.

I wonder then why this error is occurring...

@dlstadther
Copy link
Collaborator

Looks like we reference it in tox.ini

@dlstadther
Copy link
Collaborator

Maybe we need to add it to [testenv:docs] dep list

@Tarrasch
Copy link
Contributor

would it be ok if i replace enum by simple enum-like class to solve this issue? Or there is another way to solve it?

This isn't the beautiful way to do it, but maybe it's still the best so users still can just "download and run" luigi still on python 2.7. You can add a TODO in case we ever stop supporting 2.7 without various backported packages.

@nryanov
Copy link
Contributor Author

nryanov commented Jul 31, 2018

This isn't the beautiful way to do it, but maybe it's still the best so users still can just "download and run" luigi still on python 2.7. You can add a TODO in case we ever stop supporting 2.7 without various backported packages.

I agree with that, especially about python 2.7. supporting (we use luigi in prod using python 2.7 ^_^), but, at this moment, i don't see any options.
I tried to reproduce it locally using py 2.7 (with and without enum34 package) and py3.6 and everything goes well. Probably, i did something wrong as i can't reproduce the same result, but anyway the behaviour is strange.

Maybe, if we add enum package as dependency in [testenv:docs] dep list, as suggested by @dlstadther it will help

@dlstadther
Copy link
Collaborator

My recommendation fixes tests. But Python 2.7 users would still have to install enum34 from Pypi to not have import enum throw errors

@nryanov
Copy link
Contributor Author

nryanov commented Jul 31, 2018

I think that there is still a lot luigi users who use py 2.7 and it will surprise them that after update their should install new package.
On the other hand, installing 1 new package should not cause big problems.
If it will be ok for luigi's users then i don't see any pitfalls

@dlstadther
Copy link
Collaborator

As a Python 2.7 user, I realize that 2020 is coming quicker than I'd like. So i'm personally ok with adding enum34.

Do we know if 3.4+ can install enum34 and still use enum appropriately? If so, why not just add enum34 as a luigi core dependency with a ticking timer for deprecation on 2020-01-01?

@nryanov
Copy link
Contributor Author

nryanov commented Aug 1, 2018

@Tarrasch , @dlstadther i've added enum34 as deps in tox.ini [testenv:docs] and also add this package as dependency in setup.py for python < 3.4.
I did a little research about this:

Do we know if 3.4+ can install enum34 and still use enum appropriately?

And found that it can cause problems in some cases. So, to avoid this, i think, that we could decide to install or not enum34 dynamically

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gr1f0n6x Thanks for adding in the dynamic enum34 install and fixing the tests! On that front, i'm satisfied.

After another quick review of the entire PR, all i have left is an optional comment about the unnecessary use of a named positional arg. You can chose to ignore it if you'd like.

Assuming @Tarrasch approves, I say merge it!

Thanks for your hard work!

self.params = {}
self.public_params = {}
self.hidden_params = {}
self.set_params(params=params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nbd, but it's slight overkill to include the named arg here

set_params(params=params) == set_params(params)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔︎

if not task.params:
task.params = _get_default(params, {})
task.set_params(params=params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔︎

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ready for final review?

@@ -48,6 +49,9 @@ def get_static_files(path):
install_requires.remove('python-daemon<3.0')
install_requires.append('sphinx>=1.4.4') # Value mirrored in doc/conf.py

if sys.version_info < (3, 4):
install_requires.append('enum34>1.1.0')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not strongly against this. But it will break for some users who don't install the package the normal way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlstadther, what do you think? Maybe it's time to request users to always install through a package manager, hopefully not many will oppose this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does one use luigi without either PyPi or running setup.py?

So long as people can appropriately use a released version from PyPi or bleeding edge through manual install, I'm 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you could just download the source filles and set your PYTHONPATH=luigi and do python luigi/.xxx.py and get going. For the server you still needed tornado.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think people will complain having to install with a package manager or manually setup.py install/develop

@nryanov
Copy link
Contributor Author

nryanov commented Aug 3, 2018

@Tarrasch , yes, i think that it is ready for review

task = Task()
luigi.build(tasks=[task], workers=2, scheduler_port=self.get_http_port())

time.sleep(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleeping in tests is not ideal but okay. :)

Copy link
Contributor Author

@nryanov nryanov Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, it's mistake :(
I wanted to add delay only for remote scheduler and only to guarantee that all messages were delivered, because i thought that there is a chance that message will not be delivered


task = TestTask3()

self.assertEqual(task.to_str_params(), {'param_one': '1', 'param_two': '2', 'param_three': '3'})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What practice do you follow to put out spaces in tests by the way? Some like the Arrange Act Assert style. We have no selected style in luigi so this is ok.

Copy link
Contributor Author

@nryanov nryanov Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, using additional space i want to emphasize the assertion block.
Just saw this style somewhere ^_^

@Tarrasch Tarrasch merged commit c9ed761 into spotify:master Aug 8, 2018
@Tarrasch
Copy link
Contributor

Tarrasch commented Aug 8, 2018

Aaaaaaawesome!!! We've wanted this for many years already! So nice to see it finally completed!

thisiscab pushed a commit to glossier/luigi that referenced this pull request Aug 8, 2018
dlstadther added a commit to dlstadther/luigi that referenced this pull request Aug 14, 2018
* upstream-master: (82 commits)
  S3 client refactor (spotify#2482)
  Rename to rpc_log_retries, and make it apply to all the logging involved
  Factor log_exceptions into a configuration parameter
  Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478)
  Add a visiblity level for luigi.Parameters (spotify#2278)
  Add support for multiple requires and inherits arguments (spotify#2475)
  Add metadata columns to the RDBMS contrib (spotify#2440)
  Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477)
  tests: Use RunOnceTask where possible (spotify#2476)
  Optional TOML configs support (spotify#2457)
  Added default port behaviour for Redshift (spotify#2474)
  Add codeowners file with default and specific example (spotify#2465)
  Add Data Revenue to the `blogged` list (spotify#2472)
  Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469)
  Use task_id comparison in Task.__eq__. (spotify#2462)
  Add stale config
  Move github templates to .github dir
  Fix transfer config import (spotify#2458)
  Additions to provide support for the Load Sharing Facility (LSF) job scheduler (spotify#2373)
  Version 2.7.6
  ...
dlstadther added a commit to dlstadther/luigi that referenced this pull request Aug 14, 2018
* upstream-master:
  S3 client refactor (spotify#2482)
  Rename to rpc_log_retries, and make it apply to all the logging involved
  Factor log_exceptions into a configuration parameter
  Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478)
  Add a visiblity level for luigi.Parameters (spotify#2278)
  Add support for multiple requires and inherits arguments (spotify#2475)
  Add metadata columns to the RDBMS contrib (spotify#2440)
  Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477)
  tests: Use RunOnceTask where possible (spotify#2476)
  Optional TOML configs support (spotify#2457)
  Added default port behaviour for Redshift (spotify#2474)
  Add codeowners file with default and specific example (spotify#2465)
  Add Data Revenue to the `blogged` list (spotify#2472)
dlstadther added a commit to dlstadther/luigi that referenced this pull request Aug 16, 2018
* upstream-master:
  Remove long-deprecated scheduler config variable alternatives (spotify#2491)
  Bump tornado milestone version (spotify#2490)
  Update moto to 1.x milestone version (spotify#2471)
  Use passed password when create a redis connection (spotify#2489)
  S3 client refactor (spotify#2482)
  Rename to rpc_log_retries, and make it apply to all the logging involved
  Factor log_exceptions into a configuration parameter
  Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478)
  Add a visiblity level for luigi.Parameters (spotify#2278)
  Add support for multiple requires and inherits arguments (spotify#2475)
  Add metadata columns to the RDBMS contrib (spotify#2440)
  Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477)
  tests: Use RunOnceTask where possible (spotify#2476)
  Optional TOML configs support (spotify#2457)
  Added default port behaviour for Redshift (spotify#2474)
  Add codeowners file with default and specific example (spotify#2465)
  Add Data Revenue to the `blogged` list (spotify#2472)
  Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469)
  Use task_id comparison in Task.__eq__. (spotify#2462)
@cabhi
Copy link

cabhi commented Sep 21, 2018

I can still see parameter with visibility set to PRIVATE in luigi execution summary

Sample Flow

import luigi
from luigi import LocalTarget
from luigi.parameter import ParameterVisibility

class PTest(luigi.Task):
a_var = luigi.Parameter(significant=False, visibility=ParameterVisibility.PRIVATE)

def requires(self):
    return PtestExternal(p=self.a_var)

def output(self):
    return LocalTarget("hello.txt")

class PtestExternal(luigi.ExternalTask):
p = luigi.Parameter(significant=False, visibility=ParameterVisibility.PRIVATE)

def output(self):
    return LocalTarget(path=self.p)

log.txt

@nryanov
Copy link
Contributor Author

nryanov commented Sep 21, 2018

@cabhi ,the goal was to hide parameters only in WEB part and in database.
You will see all parameters in execution summary, including private and hidden. For me, if you can run code using it sources, than there is no point to hide any parameters, because you can see it anyway.
But if there is needance in hiding parameers also in execution summary, then it can be also done

@nryanov nryanov deleted the parameter_visibility branch September 21, 2018 22:53
@rayrrr rayrrr mentioned this pull request Apr 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants