-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TemplateCodeInfo for code info in job template only #5350
Conversation
04cd991
to
9d8d09f
Compare
Codecov Report
@@ Coverage Diff @@
## develop #5350 +/- ##
===========================================
+ Coverage 79.57% 82.13% +2.57%
===========================================
Files 517 533 +16
Lines 36982 38504 +1522
===========================================
+ Hits 29424 31623 +2199
+ Misses 7558 6881 -677
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
2b50876
to
43029f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @unkcpz . Have a few suggestions. The tests are failing because the JobTemplate
that is generated by the engine is written to the repository of the CalcJobNode
by dumping it to JSON. The old CodeInfo
was a simple dict
and so serializable, but the new dataclasses
are not JSON serializable.
In [1]: import json
In [2]: from dataclasses import dataclass
In [4]: @dataclass
...: class Test:
...: a: int = 1
...:
In [5]: a = Test()
In [6]: a.a
Out[6]: 1
In [7]: json.dumps(a)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-2f50cf32d976> in <module>
----> 1 json.dumps(a)
/usr/lib/python3.9/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
229 cls is None and indent is None and separators is None and
230 default is None and not sort_keys and not kw):
--> 231 return _default_encoder.encode(obj)
232 if cls is None:
233 cls = JSONEncoder
/usr/lib/python3.9/json/encoder.py in encode(self, o)
197 # exceptions aren't as detailed. The list call should be roughly
198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
200 if not isinstance(chunks, (list, tuple)):
201 chunks = list(chunks)
/usr/lib/python3.9/json/encoder.py in iterencode(self, o, _one_shot)
255 self.key_separator, self.item_separator, self.sort_keys,
256 self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)
258
259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
/usr/lib/python3.9/json/encoder.py in default(self, o)
177
178 """
--> 179 raise TypeError(f'Object of type {o.__class__.__name__} '
180 f'is not JSON serializable')
181
TypeError: Object of type Test is not JSON serializable
However, you can turn it into a dictionary first. So in line 763, of CalcJob.presubmit
turn
subfolder.create_file_from_filelike(io.StringIO(json.dumps(job_tmpl)), 'job_tmpl.json', 'w', encoding='utf8')
into
from dataclasses import asdict
subfolder.create_file_from_filelike(io.StringIO(json.dumps(asdict(job_tmpl))), 'job_tmpl.json', 'w', encoding='utf8')
afcb67e
to
f54a099
Compare
Thanks @sphuber. I check the failed test caused by |
Err, I feel the better solution is just to handle data classes at JSON dump time: import dataclasses
def encoder(obj):
if dataclasses.is_dataclass(obj):
return dataclasses.as_dict(obj)
raise TypeError(f" {obj!r} is not JSON serializable")
subfolder.create_file_from_filelike(io.StringIO(json.dumps(job_tmpl, default=encoder)), 'job_tmpl.json', 'w', encoding='utf8') But, I don't want to overcomplicate the issue, so I'll leave it up to you guys, and maybe open this as a more general issue: I'm very much not a fan of these @sphuber I think one "argument" for using them was dynamic auto-completion but, just out of interest, I found that since ipython/ipython#5304, you can anyhow auto-complete dict keys: |
I agree with @chrisjsewell that we should avoid using the custom dict classes and use @unkcpz My suggestion was not to convert the But @chrisjsewell solution is more elegant and will automatically capture all instances of dataclasses if we were to add more in the future. I do wonder what the reasoning was for not adding a default support in JSON serializer for dataclasses in the standard library. Seems like a very common use case and in most cases where there are simple base types in the dataclass, it should be trivially serializable. |
fb68c17
to
a5a79d1
Compare
Thanks @chrisjsewell @sphuber. I update using the way Chris suggests. For the |
No, The dataclasses should simply be used whenever we have a simple class that essentially just wraps some data through attributes. This is the case for |
@giovannipizzi @chrisjsewell even though the chances are small, this could potentially be breaking if there are scheduler plugins that rely on the type of the |
3b5914a
to
d89dd03
Compare
The AiiDA registry currently contains four packages that add a scheduler plugin: https://github.com/zhubonan/aiida-fireworks-scheduler @unkcpz Can you check whether any of those would be impacted? |
Just check all scheduler implementations. There are no codes that override the part of generating exec command of script. So it should be pretty safe to go forward. 😄 |
Great, any objections @sphuber ? |
No, if you rebase the PR and verify that it won't break the schedulers on the plugin registry, then I will approve and merge this. |
41a5e4f
to
6c91e17
Compare
In the current design, the code_info of calc_job is read from the code setup and plugin then pass to the job template to create the bash script. However, job template needs more flexibility to control the different part of script runline where currently all the part - exec_name from code uuid, - code_info.cmdline_params, - mpi parameters from computer setting are stacked together to the job template's code_info. In this PR, the class `TemplateCodeInfo` is created to handle the elements, where the `code_uuid` and `withmpi` fields are not used in job script generation. The code_info of JobTemplate and of `CalcJob` are decoupled from each other and lead to more flexibility.
6c91e17
to
a8348d6
Compare
Hi @sphuber, I rebase PR and check the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @unkcpz , my understanding is this can move forward !
This is another piece required for containerized code implementation. In the current design, the code_info of calc_job is read from the code setup and plugin then pass to the job template to create the bash script.
However, job template needs more flexibility to control the different part of script runline where currently all the part (1. exec_name from code uuid, 2. code_info.cmdline_params, 3. mpi parameters from computer setting) are stacked together to the job template's code_info.
In this PR, the class
TemplateCodeInfo
is created to handle the elements needed to generate a job script. Where thecode_uuid
andwithmpi
fields are not used in job script generation.