Additional write modes for `WeldxFile` #690

vhirtham · 2022-01-27T14:32:55Z

Currently, we only have the modes r and for reading and rw for creating a new file if it doesn't exist or modifying a file if it does exist.
I suggest that we also add w to only write if the file doesn't exist and wf to overwrite an existing file.

Reason: When developing a file schema, the rw mode can have some undesired side effects. For example, you write a file, and afterward, you adjust the schema to be more strict and wouldn't allow the current data to be written. The next write attempt gives you the expected schema validation error. You correct the data and try to write it again, but you get the same validation error, even if the data seems valid. Why? Because WeldxFile reads the old file first, that was valid before the schema was updated. If you don't realize this, you have a rough time figuring out why your correct data still causes a validation error (or at least seems to).

So using w mode would raise a meaningful exception if the file already exists while wf (force) simply overwrites the old file without opening it.

The text was updated successfully, but these errors were encountered:

marscher · 2022-01-27T14:59:15Z

Thanks for bringing this up, but ain't that a real corner case?
I do not understand, why the third write (how did you actually do that, since there are several write mechanisms) would trigger the existing file to be read again.

If we consider implementing more file handling modes, I suggest we stick to the nomenclature of the builtin open function: https://docs.python.org/3/library/functions.html#open

vhirtham · 2022-01-28T08:24:14Z

I do not understand, why the third write (how did you actually do that, since there are several write mechanisms) would trigger the existing file to be read again.

If you restart the script, then the file will be read and validated again in "rw" mode before you get to the writing part. Due to the asdf caching, you can't even test schema modifications without restarting the script.

but ain't that a real corner case?

My case might be a corner case, but there might also be situations where a user doesn't want to modify a file and accidentally picks an existing name. In this case, an exception would be nice before it is too late ;)

If we consider implementing more file handling modes, I suggest we stick to the nomenclature of the builtin open function: https://docs.python.org/3/library/functions.html#open

👍

marscher · 2022-01-28T11:11:47Z

If a file exists and the user does not want to modify it, he or she should have opened it with mode="r" (the default). So there is already some sanity checks about accidentally modifying files. What am I missing?

vhirtham · 2022-01-28T11:22:57Z

If a file exists and the user does not want to modify it, he or she should have opened it with mode="r" (the default). So there is already some sanity checks about accidentally modifying files. What am I missing?

That the user might want to write a new file without accidentally overwriting stuff in already existing files just because he forgot to increase a number or whatever else might be the reason he used an existing file name ;)

CagtayFabry · 2022-02-01T21:14:39Z

I guess the problem comes down to this example:

# create a valid weldx file
with WeldxFile("debug.wx", tree={"a":42}, mode="rw") as wxfile:
    pass

# try creating the file again with new contents
with WeldxFile("debug.wx", tree={"b":19}, mode="rw", custom_schema="single_pass_weld-0.1.0") as wxfile:
    pass

>>> ValidationError: 'process' is a required property
>>> ...
>>> On instance:
>>>    {'a': 42, .... }

This might be a bit confusing since the validation error happens when reading the existing file. Maybe we can catch that case and raise a more specific error? To let the users know that the existing file is the problem (and they can delete it if they really want to create a file from scratch)

marscher · 2022-02-02T08:06:30Z

It is not clear weather the ValidationError is raised upon writing the new tree (b=19) or reading the existing data, which also violates the schema.
Anyhow we should add a truncation flag as in Python open for advanced users.

CagtayFabry · 2022-02-02T08:14:06Z

It is not clear weather the ValidationError is raised upon writing the new tree (b=19) or reading the existing data, which also violates the schema. Anyhow we should add a truncation flag as in Python open for advanced users.

To me it looks like the ValidationError is created upon reading in the example since the data {'a': 42, is listed in the exception

But to be sure we should certainly run a test with valid data (or a different schema) when trying the second write operation

marscher · 2022-02-02T08:16:15Z

We could also specify, if the schema validation should happen upon opening or writing (or both).

marscher · 2022-02-03T20:42:58Z

In general I'm a great fan of "fail as early as possible", since it makes error tracking much easier. In this case this pattern just prohibits updating the file, if the schema is not being fulfilled. So besides of adding new file modes, I suggest to add a flag "early_validation", which is should be set to True by default (IMHO). This way it would be possible to load a non conforming files, update the tree and validate it later on during serialization/writing. I would also explicitly state, that this flag is most useful for schema development.

What do you think?

Anyhow I think this file modes is the one you are basically referring to achieve the overwrite the contents:

"w+" - open for writing, truncating the file first

CagtayFabry · 2022-02-04T09:18:48Z

What do you think of the following approach:

we expand the allowed inputs for custom_schema to support a pair of schema files with this behavior

custom_schema = None # no validation (default)
custom_schema = "A" # validate against A on read and write (current behavior)
custom_schema = ["A", "B"] # validate with A on read, validate with B on write
custom_schema = [None, "B"] # no validation on read, validate with B on write
custom_schema = ["A", None] # validate with A on read, no validation on write

This would provide a neat wrapper when transforming/extracting data between different schemas

With this implementing w+ would be similar to [None, "B"]

marscher · 2022-02-04T12:25:40Z

That looks very useful. I will open up a PR for this.

vhirtham added ASDF everything ASDF related (python + schemas) feature new API features validation custom ASDF validators file weldx file handling (WeldxFile) labels Jan 27, 2022

marscher mentioned this issue Feb 7, 2022

split read/write schema validation #697

Merged

5 tasks

marscher closed this as completed in #697 Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional write modes for `WeldxFile` #690

Additional write modes for `WeldxFile` #690

vhirtham commented Jan 27, 2022

marscher commented Jan 27, 2022

vhirtham commented Jan 28, 2022

marscher commented Jan 28, 2022

vhirtham commented Jan 28, 2022

CagtayFabry commented Feb 1, 2022

marscher commented Feb 2, 2022

CagtayFabry commented Feb 2, 2022

marscher commented Feb 2, 2022

marscher commented Feb 3, 2022

CagtayFabry commented Feb 4, 2022

marscher commented Feb 4, 2022

Additional write modes for WeldxFile #690

Additional write modes for WeldxFile #690

Comments

vhirtham commented Jan 27, 2022

marscher commented Jan 27, 2022

vhirtham commented Jan 28, 2022

marscher commented Jan 28, 2022

vhirtham commented Jan 28, 2022

CagtayFabry commented Feb 1, 2022

marscher commented Feb 2, 2022

CagtayFabry commented Feb 2, 2022

marscher commented Feb 2, 2022

marscher commented Feb 3, 2022

CagtayFabry commented Feb 4, 2022

marscher commented Feb 4, 2022

Additional write modes for `WeldxFile` #690

Additional write modes for `WeldxFile` #690