-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
externalize: Dont register the copied tasks #1975
Conversation
@Tarrasch, thanks for your PR! By analyzing the history of the files in this pull request, we identified @erikbern, @steenzout and @daveFNbuck to be potential reviewers. |
@@ -206,6 +206,24 @@ def run(self): | |||
self.assertIsNotNone(MyTask.run) # Check immutability | |||
self.assertIsNotNone(MyTask().run) # Check immutability | |||
|
|||
def test_dont_affects_registry(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_externalize_doesnt_affect_the_registry
would be more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
# This first check is just an assumption rather than assertion | ||
self.assertTrue(self.run_locally(['MyTask'])) | ||
luigi.task.externalize(MyTask) | ||
self.assertTrue(self.run_locally(['MyTask'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the idea here is that MyTask is still not external?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... Yea this code is hard to read. I'm adding a comment.
It tests that self.run_locally(['MyTask'])
doesn't encounter any "ambiguous task" issues after running the externalize
code.
@@ -206,6 +206,24 @@ def run(self): | |||
self.assertIsNotNone(MyTask.run) # Check immutability | |||
self.assertIsNotNone(MyTask().run) # Check immutability | |||
|
|||
def test_dont_affects_registry(self): | |||
class MyTask(luigi.Task): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just externalize luigi.Task in these tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of two reasons:
- Maybe in the future we don't expose the luigi.Task,
- While true, I've made
externalize
have no side effects, but a bug in it could actually affect luigi.Task, and then subsequent tests could start to fail if they are run after this test.
name = cls.get_task_family() | ||
|
||
if name in reg and reg[name] != cls and \ | ||
reg[name] != cls.AMBIGUOUS_CLASS and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're looking at this code, it seems that if we register the same name three times with different classes, the second registration will turn it into cls.AMBIGUOUS_CLASS and the third registration will hit the else clause here and register itself. That doesn't seem right. Maybe this line should go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. I added a test case. for this. You were right, the current behavior is broken (see soon up and coming update to PR) :)
|
||
if name in reg and reg[name] != cls and \ | ||
reg[name] != cls.AMBIGUOUS_CLASS and \ | ||
not issubclass(cls, reg[name]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this cover the reg[name] != cls
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. I removed the reg[name] != cls
check. Nice catch!
reg[name] = cls.AMBIGUOUS_CLASS | ||
else: | ||
reg[name] = cls | ||
if cls._visible_in_registry: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why didn't you just add this to the existing conditional? It would simplify the diff a bit and avoid extra nesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually not equivalent. (if it's not visible in registry, we don't want the else clause).
However, I agree we should avoid extra nesting. Made this into a if not blah: continue
clause.
|
||
return ParentTask | ||
|
||
def test_requries_has_effect_run_child(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requires misspelled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
self.assertTrue(self.run_locally_split('ChildTask --my-parameter actuallyset')) | ||
self.assertEqual(ParentTask.class_variable, 'actuallyset') | ||
|
||
def test_requries_has_effect_run_parent(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requires again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -14,7 +14,7 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
# | |||
from helpers import LuigiTestCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you accidentally pulled some unrelated changes to util_test into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually not completely unrelated. Much of the machinery here is relying on the Registry that got changed here.
I'll amend the commit message saying I intentionally added these tests for this change set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big thanks for review. I've addressed all your comments. :)
I'll update the code in a few hours I think.
@@ -206,6 +206,24 @@ def run(self): | |||
self.assertIsNotNone(MyTask.run) # Check immutability | |||
self.assertIsNotNone(MyTask().run) # Check immutability | |||
|
|||
def test_dont_affects_registry(self): | |||
class MyTask(luigi.Task): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of two reasons:
- Maybe in the future we don't expose the luigi.Task,
- While true, I've made
externalize
have no side effects, but a bug in it could actually affect luigi.Task, and then subsequent tests could start to fail if they are run after this test.
# This first check is just an assumption rather than assertion | ||
self.assertTrue(self.run_locally(['MyTask'])) | ||
luigi.task.externalize(MyTask) | ||
self.assertTrue(self.run_locally(['MyTask'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... Yea this code is hard to read. I'm adding a comment.
It tests that self.run_locally(['MyTask'])
doesn't encounter any "ambiguous task" issues after running the externalize
code.
reg[name] = cls.AMBIGUOUS_CLASS | ||
else: | ||
reg[name] = cls | ||
if cls._visible_in_registry: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually not equivalent. (if it's not visible in registry, we don't want the else clause).
However, I agree we should avoid extra nesting. Made this into a if not blah: continue
clause.
|
||
if name in reg and reg[name] != cls and \ | ||
reg[name] != cls.AMBIGUOUS_CLASS and \ | ||
not issubclass(cls, reg[name]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. I removed the reg[name] != cls
check. Nice catch!
@@ -206,6 +206,24 @@ def run(self): | |||
self.assertIsNotNone(MyTask.run) # Check immutability | |||
self.assertIsNotNone(MyTask().run) # Check immutability | |||
|
|||
def test_dont_affects_registry(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
name = cls.get_task_family() | ||
|
||
if name in reg and reg[name] != cls and \ | ||
reg[name] != cls.AMBIGUOUS_CLASS and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. I added a test case. for this. You were right, the current behavior is broken (see soon up and coming update to PR) :)
|
||
return ParentTask | ||
|
||
def test_requries_has_effect_run_child(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
self.assertTrue(self.run_locally_split('ChildTask --my-parameter actuallyset')) | ||
self.assertEqual(ParentTask.class_variable, 'actuallyset') | ||
|
||
def test_requries_has_effect_run_parent(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -14,7 +14,7 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
# | |||
from helpers import LuigiTestCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually not completely unrelated. Much of the machinery here is relying on the Registry that got changed here.
I'll amend the commit message saying I intentionally added these tests for this change set.
This fixes an implementation detail, so that luigi.task.externalize actually does what it says it does in the documentation. Hence I hope pretty uncontroversial. This makes it so that calling externalize don't register a new task. A pretty serious bug. Now, there's still a caveat, as we're mostly hiding that we do side affects, if anyone would actually inherit from what is returned by externalize. That subtask wouldn't be registered either. This code is quite easy to get wrong somewhere. Luckily luigi is decently tested. To make me even more confident I also added a couple of tests: * Two tests in task_test.py that didn't pass before. * I created the task_register_test.py file. * I added some util tests too, because they are also all about Copying classes and namespace clashes. And I want to be on the safe side.
00d0769
to
613d804
Compare
""" | ||
# We have to do this on-demand in case task names have changed later | ||
# We return this in a topologically sorted list of inheritance: this is useful in some cases (#822) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the use case in #822 go away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wohoo, I'm starting to feel ready to do a 2.5 release 😎 |
This fixes an implementation detail, so that luigi.task.externalize actually does what it says it does in the documentation. Hence I hope pretty uncontroversial. This makes it so that calling externalize don't register a new task. A pretty serious bug. Now, there's still a caveat, as we're mostly hiding that we do side affects, if anyone would actually inherit from what is returned by externalize. That subtask wouldn't be registered either. This code is quite easy to get wrong somewhere. Luckily luigi is decently tested. To make me even more confident I also added a couple of tests: * Two tests in task_test.py that didn't pass before. * I created the task_register_test.py file. * I added some util tests too, because they are also all about Copying classes and namespace clashes. And I want to be on the safe side.
This fixes an implementation detail, so that luigi.task.externalize
actually does what it says it does in the documentation. Hence I
hope pretty uncontroversial. This makes it so that calling externalize
don't register a new task. A pretty serious bug.
Now, there's still a caveat, as we're mostly hiding that we do side
affects, if anyone would actually inherit from what is returned by
externalize. That subtask wouldn't be registered either.
Have you tested this? If so, how?
This code is quite easy to get wrong somewhere. Luckily luigi is
decently tested. To make me even more confident I also added a couple of
tests. Including two tests that didn't pass before.
Further. I will try this in production too. Ensuring everything is actually working this time. :)