-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple output links with the same link label #5278
Comments
Here is the log. Not sure if it is relevant, but I was using the new Pinging @sphuber @chrisjsewell @ramirezfranciscof in case they have an idea of what might be going on. |
At a quick guess, it is not handling this exception "correctly". I note the only place this exception is raised in core is in |
The order of creation (at least by pk) is a bit weird though; each output node is duplicated before the next one is added
|
Another thing of note: aiida-core/aiida/backends/sqlalchemy/models/node.py Lines 186 to 195 in e91f569
This talks about not having a unique constraint on just |
I think that should be possible, but not entirely sure to be honest. Not for example that we cannot put a unique constraint on |
How do you think this call would lead to the same outputs getting attached multiple times? The Given that the creation of the nodes seems interleaved (as you already noticed), it seems more likely that two workers somehow managed to start working on the same node and both called the parser almost exactly at the same time. This means they both got to add the output nodes and then they both tried to seal the node. I see a lot of That being said, that potentially explains why the code that adds the output nodes is called twice, but it should still have triggered the link validation. But maybe here as well, since they were so close to one another in execution, when the one was storing the node, the second was adding the output and the link validation would not have complained yet because the transaction of the first would not have finished. It would require really precise timing but it certainly is feasible I think |
Thanks both for the quick answers. I also think/fear that the issue is with two workers working on the same thing and retrieving/parsing the results at the same time. As I mentioned in #5105 - we probably need to work on dropping RMQ soon and redesigning all that part, but this will definitely take time and maybe we should anyway release 2.0 first. Maybe it's better to add in 2.0 a strong check on the RMQ version at every |
I would definitely not try and hold up v2.0 for this. It is not something we will be introducing with v2.0 anyway, it already existed ever since v1.0. But I also agree that this is a critical problem that we need to tackle a.s.a.p. |
@edan-bainglass this seems to be a P.S. picked your name from here |
Yes, I think this was mostly likely due to multiple workers working on the same node. If RMQ was too recent as Gio mentioned (but didn't fix the server configuration) this is quite likely and nothing we can do. Anyway, I am closing this for now |
While testing the current develop branch (commit ff1318b) I encountered this very unexpected thing: at least one of my CalcJobs got multiple nodes attached with the same link name. See below:
This, beside breaking a number of assumptions in AiiDA, also makes certain commands raise, like e.g.
n.outputs.output_trajectory
raisesKeyError: "duplicate label 'output_band' in namespace ''"
.Any idea of what has changed that can cause this behaviour? I'm adding this to the v2.0 milestone as I think this is a critical bug.
I'm not sure how I can provide more debug information on what happened (I can provide the AiiDA daemon logs, though)
The text was updated successfully, but these errors were encountered: