Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

externalize: Dont register the copied tasks #1975

Merged
merged 1 commit into from
Jan 5, 2017

Conversation

Tarrasch
Copy link
Contributor

This fixes an implementation detail, so that luigi.task.externalize
actually does what it says it does in the documentation. Hence I
hope pretty uncontroversial. This makes it so that calling externalize
don't register a new task. A pretty serious bug.

Now, there's still a caveat, as we're mostly hiding that we do side
affects, if anyone would actually inherit from what is returned by
externalize. That subtask wouldn't be registered either.

Have you tested this? If so, how?

This code is quite easy to get wrong somewhere. Luckily luigi is
decently tested. To make me even more confident I also added a couple of
tests. Including two tests that didn't pass before.

Further. I will try this in production too. Ensuring everything is actually working this time. :)

@mention-bot
Copy link

@Tarrasch, thanks for your PR! By analyzing the history of the files in this pull request, we identified @erikbern, @steenzout and @daveFNbuck to be potential reviewers.

@@ -206,6 +206,24 @@ def run(self):
self.assertIsNotNone(MyTask.run) # Check immutability
self.assertIsNotNone(MyTask().run) # Check immutability

def test_dont_affects_registry(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_externalize_doesnt_affect_the_registry would be more clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

# This first check is just an assumption rather than assertion
self.assertTrue(self.run_locally(['MyTask']))
luigi.task.externalize(MyTask)
self.assertTrue(self.run_locally(['MyTask']))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea here is that MyTask is still not external?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... Yea this code is hard to read. I'm adding a comment.

It tests that self.run_locally(['MyTask']) doesn't encounter any "ambiguous task" issues after running the externalize code.

@@ -206,6 +206,24 @@ def run(self):
self.assertIsNotNone(MyTask.run) # Check immutability
self.assertIsNotNone(MyTask().run) # Check immutability

def test_dont_affects_registry(self):
class MyTask(luigi.Task):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just externalize luigi.Task in these tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of two reasons:

  • Maybe in the future we don't expose the luigi.Task,
  • While true, I've made externalize have no side effects, but a bug in it could actually affect luigi.Task, and then subsequent tests could start to fail if they are run after this test.

name = cls.get_task_family()

if name in reg and reg[name] != cls and \
reg[name] != cls.AMBIGUOUS_CLASS and \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're looking at this code, it seems that if we register the same name three times with different classes, the second registration will turn it into cls.AMBIGUOUS_CLASS and the third registration will hit the else clause here and register itself. That doesn't seem right. Maybe this line should go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. I added a test case. for this. You were right, the current behavior is broken (see soon up and coming update to PR) :)


if name in reg and reg[name] != cls and \
reg[name] != cls.AMBIGUOUS_CLASS and \
not issubclass(cls, reg[name]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this cover the reg[name] != cls check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. I removed the reg[name] != cls check. Nice catch!

reg[name] = cls.AMBIGUOUS_CLASS
else:
reg[name] = cls
if cls._visible_in_registry:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why didn't you just add this to the existing conditional? It would simplify the diff a bit and avoid extra nesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually not equivalent. (if it's not visible in registry, we don't want the else clause).

However, I agree we should avoid extra nesting. Made this into a if not blah: continue clause.


return ParentTask

def test_requries_has_effect_run_child(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requires misspelled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

self.assertTrue(self.run_locally_split('ChildTask --my-parameter actuallyset'))
self.assertEqual(ParentTask.class_variable, 'actuallyset')

def test_requries_has_effect_run_parent(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requires again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -14,7 +14,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
from helpers import LuigiTestCase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you accidentally pulled some unrelated changes to util_test into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually not completely unrelated. Much of the machinery here is relying on the Registry that got changed here.

I'll amend the commit message saying I intentionally added these tests for this change set.

Copy link
Contributor Author

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big thanks for review. I've addressed all your comments. :)

I'll update the code in a few hours I think.

@@ -206,6 +206,24 @@ def run(self):
self.assertIsNotNone(MyTask.run) # Check immutability
self.assertIsNotNone(MyTask().run) # Check immutability

def test_dont_affects_registry(self):
class MyTask(luigi.Task):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of two reasons:

  • Maybe in the future we don't expose the luigi.Task,
  • While true, I've made externalize have no side effects, but a bug in it could actually affect luigi.Task, and then subsequent tests could start to fail if they are run after this test.

# This first check is just an assumption rather than assertion
self.assertTrue(self.run_locally(['MyTask']))
luigi.task.externalize(MyTask)
self.assertTrue(self.run_locally(['MyTask']))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... Yea this code is hard to read. I'm adding a comment.

It tests that self.run_locally(['MyTask']) doesn't encounter any "ambiguous task" issues after running the externalize code.

reg[name] = cls.AMBIGUOUS_CLASS
else:
reg[name] = cls
if cls._visible_in_registry:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually not equivalent. (if it's not visible in registry, we don't want the else clause).

However, I agree we should avoid extra nesting. Made this into a if not blah: continue clause.


if name in reg and reg[name] != cls and \
reg[name] != cls.AMBIGUOUS_CLASS and \
not issubclass(cls, reg[name]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. I removed the reg[name] != cls check. Nice catch!

@@ -206,6 +206,24 @@ def run(self):
self.assertIsNotNone(MyTask.run) # Check immutability
self.assertIsNotNone(MyTask().run) # Check immutability

def test_dont_affects_registry(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

name = cls.get_task_family()

if name in reg and reg[name] != cls and \
reg[name] != cls.AMBIGUOUS_CLASS and \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. I added a test case. for this. You were right, the current behavior is broken (see soon up and coming update to PR) :)


return ParentTask

def test_requries_has_effect_run_child(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

self.assertTrue(self.run_locally_split('ChildTask --my-parameter actuallyset'))
self.assertEqual(ParentTask.class_variable, 'actuallyset')

def test_requries_has_effect_run_parent(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -14,7 +14,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
from helpers import LuigiTestCase
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually not completely unrelated. Much of the machinery here is relying on the Registry that got changed here.

I'll amend the commit message saying I intentionally added these tests for this change set.

This fixes an implementation detail, so that luigi.task.externalize
actually does what it says it does in the documentation.  Hence I
hope pretty uncontroversial. This makes it so that calling externalize
don't register a new task. A pretty serious bug.

Now, there's still a caveat, as we're mostly hiding that we do side
affects, if anyone would actually inherit from what is returned by
externalize.  That subtask wouldn't be registered either.

This code is quite easy to get wrong somewhere. Luckily luigi is
decently tested. To make me even more confident I also added a couple of
tests:

 * Two tests in task_test.py that didn't pass before.
 * I created the task_register_test.py file.
 * I added some util tests too, because they are also all about
   Copying classes and namespace clashes. And I want to be on the safe
   side.
"""
# We have to do this on-demand in case task names have changed later
# We return this in a topologically sorted list of inheritance: this is useful in some cases (#822)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the use case in #822 go away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since #1182, the Parameter class became (more) immutable, and a lots of these random corner-cases bugs went away. There were also test cases added with the fix for #822, and I assume green Travis means they still pass. :)

@Tarrasch Tarrasch merged commit f7e0b77 into spotify:master Jan 5, 2017
@Tarrasch Tarrasch deleted the fix-externalize branch January 5, 2017 02:13
@Tarrasch
Copy link
Contributor Author

Tarrasch commented Jan 5, 2017

Wohoo, I'm starting to feel ready to do a 2.5 release 😎

kreczko pushed a commit to kreczko/luigi that referenced this pull request Mar 28, 2017
This fixes an implementation detail, so that luigi.task.externalize
actually does what it says it does in the documentation.  Hence I
hope pretty uncontroversial. This makes it so that calling externalize
don't register a new task. A pretty serious bug.

Now, there's still a caveat, as we're mostly hiding that we do side
affects, if anyone would actually inherit from what is returned by
externalize.  That subtask wouldn't be registered either.

This code is quite easy to get wrong somewhere. Luckily luigi is
decently tested. To make me even more confident I also added a couple of
tests:

 * Two tests in task_test.py that didn't pass before.
 * I created the task_register_test.py file.
 * I added some util tests too, because they are also all about
   Copying classes and namespace clashes. And I want to be on the safe
   side.
This was referenced Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants