-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output metadata #162
Output metadata #162
Conversation
src/eko/output/manipulate.py
Outdated
@@ -175,5 +186,9 @@ def to_evol(eko: EKO, source: bool = True, target: bool = False): | |||
# assign pids | |||
if source: | |||
eko.rotations._inputpids = br.evol_basis_pids | |||
# update metadata | |||
eko.update_metadata({"rotations": {"inputpids": inputpids}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is in contradiction with the line above ... actually the problem is more complicated: because for to_evol
we were putting the saved inputpids
back to a list - instead the inputpids
for for flavor_reshape
is a matrix as it should. I think we can retain that behaviour but let's think for a moment ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I believe that in any case rotations.inputpids
and rotations.targetpids
should be matrices, not only for to_evol
. The question is, do we want to store the matrix itself inside the file metadata
or do we want to store the vector and then internally change the object (for example rotations.targetpids
)? In the second case we can add something in __post__init__
: Since we have both the vectors rotations.pids
and rotations.targetpids
as readed from metadata
we can just compute the matrix and assign it to rotations.targetgrid
in place of the vector. In this way, we will only store vectors, which I believe are more clear to read for an user, but then internally we have the matrices we need. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed yesterday, we want the matrix: it is uniform, also for non-standard bases, and always reliable.
The less special cases we have, the easier will be the code, and maintainability will improve.
src/eko/output/manipulate.py
Outdated
ops = np.einsum("ajbk,bd->ajdk", ops, inv_inputpids) | ||
errs = np.einsum("ajbk,bd->ajdk", errs, inv_inputpids) | ||
else: | ||
# update metadata | ||
eko.update_metadata({"rotations": {"inputpids": inputpids}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the call should move down to were the write is happening (else we get out of sync) .. note that we're casting to list (see comment below )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This made sense: we want operations to be as atomic as possible (in the ACID sense)
Some of the tests we are not passing now are failing because the |
Problem is not dumping, but loading: if you dump them as Numpy arrays, it will (since YAML can dump custom objects). But then when you load the "custom object" is not matching any longer (because of the scope and stuffs like this), thus it will fail. In any case, we want basic YAML, not custom objects, so you want to convert any array in a list of lists of lists... Lines 89 to 90 in dcb40cb
|
Yes indeed, this is what I am doing :) |
If you use Make sure you are using |
The solution that I pushed is very bad but I needed something working in order to test the rest. I will push a better solution soon. However, it seems that there is no way using the |
@@ -11,7 +11,7 @@ def test_apply(self, fake_legacy, fake_pdf): | |||
pdf_grid = apply.apply_pdf(o, fake_pdf) | |||
assert len(pdf_grid) == len(fake_card["Q2grid"]) | |||
pdfs = pdf_grid[q2_out]["pdfs"] | |||
assert list(pdfs.keys()) == o.rotations.targetpids | |||
assert list(pdfs.keys()) == list(o.rotations.targetpids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case, you can also assert they are the same despite the iterables even without converting both to lists:
assert list(pdfs.keys()) == list(o.rotations.targetpids) | |
assert all(x == y for x, y in zip(pdfs.keys(), o.rotations.targetpids)) |
No special advantage here (you save some memory and time, but it is definitely negligible here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it back: it has the disadvantage of not checking all the elements in the longest iterable, in case the length is different, so you'd need to split into a separate check.
assert list(pdfs.keys()) == list(o.rotations.targetpids) | |
assert len(pdfs.keys()) == len(o.rotations.targetpids)) | |
assert all(x == y for x, y in zip(pdfs.keys(), o.rotations.targetpids)) |
Either we write a function for this, to consistently use, or it is only more cumbersome (for a negligible advantage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once more (I believe I told you somewhere else), .keys()
is not required when you iterate on it, and list()
is iterating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes i know but for the moment I wanted to keep the changes to the tests as minimal as possible. I will drop .keys
now. :)
So now the tests are working again. This does not mean that we can merge unfortunately. I still need to do a couple of things:
If you believe there is something else, please add here. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## Fix_struct_detached #162 +/- ##
======================================================
Coverage ? 99.93%
======================================================
Files ? 96
Lines ? 4515
Branches ? 0
======================================================
Hits ? 4512
Misses ? 3
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. |
@alecandido @felixhekhorn This should now be roughly what we want. I did not tick the second point of the todo list ("Read the metadata file in order to assign the correct value to eko.rotations when an eko is loaded.") yet because I am still checking if everything is correct but I believe it is already working. Then I will only need to change the name of "inputpids" to something more telling but that should be trivial. Let me know if you have suggestions. PS: At the end (thanks to @alecandido's help) I managed to update the metadata file without the other PR (#152) |
theory, | ||
op, | ||
nt, | ||
no, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know they mean new_theory_card
and new_operator_card
, but it is a bit cryptic.
Please, favor more telling names. Trade-offs are best:
- ❌
nt
- too short, ambiguous - ❌
new_theory_card_evolve_single_member
- redundant, it is already in the scope ofevolve_single_member
- ✔️
new_theory_card
- just perfect :)
The minimal amount of information that saves you an explicit comment is the optimal choice, according to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually copied the name that was already there. Should I change also the others nt
? (and similar for no
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not top priority, but I believe we should constantly improve also these details, they make it simpler for new developers and ourselves to maintain.
If you can improve something is already better than nothing, as I said it is not high-priority, don't waste too much time. But at least change this occurrence, such that the new code has better and better style :)
@@ -61,6 +61,8 @@ def xgrid_reshape( | |||
) | |||
target_rot = b.get_interpolation(targetgrid.raw) | |||
eko.rotations._targetgrid = targetgrid | |||
# update metadata | |||
eko.update_metadata({"rotations": {"_targetgrid": targetgrid}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you need the underscore?
I know the attribute is that one, the other being a property, but it'd look nicer without, for manual inspection.
At the end of the day, the initial underscore discriminate between the virtual property, and the actual representation, but when we serialize there is only the representation and nothing more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that at the end the metadata file should be read and used to construct a Rotations
object. In order to do that it expects _targetgrid
and similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I was proposing to change what Rotations
expects, but this we can discuss together, it's easier :)
@@ -91,7 +92,7 @@ def raw(self): | |||
elif isinstance(value, float): | |||
value = float(value) | |||
elif isinstance(value, interpolation.XGrid): | |||
value = value.dump() | |||
value = value.dump()["grid"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, this is weird: the serialization of XGrid
should be fully managed by the class itself.
So, we should always save what .dump()
returns, and XGrid
itself should be able to be loaded from the whole return value of a .dump()
.
If this is not happening, we should fix on XGrid
side, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed, the point is that dump()
return a dictionary like {'grid':[...], 'log':[...]} where the actual grid is inside grid
. For the tests, sometimes the log
part is used and sometimes it is not (and this is causing errors). So I was not sure on how to fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure that the log
part is always used, or ignored, on the consumer side
try: | ||
return f"o{operators_card['hash'][:6]}_t{theory_card['hash'][:6]}.tar" | ||
except KeyError: | ||
return "o000000_t000000.tar" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, where does this hard-coded string come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe re-raising a more explicit error would be more appropriate...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that in the new format we don't have hash
anymore. So, even in this case, we need to choose how to fix. This was a temporary solution just to have tests working but of course we need to make a decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hashes are needed for ekomark
, but inside ekobox
there should be no need for them, nor we should expose them to the user.
Let's keep them for the benchmarks, where they are actually used, and completely drop this function here.
@@ -130,7 +130,9 @@ class TestEKO: | |||
def _default_cards(self): | |||
t = tc.generate(0, 1.0) | |||
o = oc.generate([10.0]) | |||
return compatibility.update(t, o) | |||
nt, no = compatibility.update(t, o) | |||
no["rotations"]["pids"] = no["rotations"]["targetpids"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we put this in compatibility.update
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I don't know, I believe it is something very specific to this test. Most of the case we do not need this because [rotations][pids]
will be already there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then maybe upgrade default cards, such that we don't need it also here :)
In the end, default cards are for the users (as everything else in ekobox
), we don't want to generate incompatible ones :)
Co-authored-by: Alessandro Candido <candido.ale@gmail.com>
Benchmarks are broken:
|
you also need to dump version informations (i.e. both program and data) to the metadata (else we get the usual problems) (here the additional scope we added already becomes handy) . This is also relevant for a converter (e.g. the one in #171). |
Don't mind: this I'm doing in #172 |
We want to have a file called
metadata.yaml
in which we can store the relevant metadata for the operators as thetargetgrid
, theinputgrid
, thetargetpids
and theinputpids
.Moreover we want to move the
operator_card
and thetheory_card
inside a folderlog
in which we will also store all the logs of the functions acting on the operator, in order to ensure reproducibility. @felixhekhorn @alecandido