Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nonetype Entires Break Individual Image Metadata #406

Open
PaulHuwe opened this issue Oct 16, 2024 · 6 comments
Open

Nonetype Entires Break Individual Image Metadata #406

PaulHuwe opened this issue Oct 16, 2024 · 6 comments
Assignees

Comments

@PaulHuwe
Copy link
Collaborator

None values in Metadata can cause validation errors:

def mk_rcs(**kwargs):
  ...
  meta.rcs["bank"] = kwargs.get("bank", None)
  meta.rcs["led"] = kwargs.get("led", None)
  ...

Multiple files then added into a mosaic via:

wfi_mosaic_model.append_individual_image_meta(wfi_imageX.meta)

Then trying to validate:

wfi_mosaic_model.validate()

Yields:

dtype = dtype('O'), include_byteorder = True, override_byteorder = None

    def numpy_dtype_to_asdf_datatype(dtype, include_byteorder=True, override_byteorder=None):
        dtype = np.dtype(dtype)
        if dtype.names is not None:
            fields = []
            for name in dtype.names:
                field = dtype.fields[name][0]
                d = {}
                d["name"] = name
                field_dtype, byteorder = numpy_dtype_to_asdf_datatype(field, override_byteorder=override_byteorder)
                d["datatype"] = field_dtype
                if include_byteorder:
                    d["byteorder"] = byteorder
                if field.shape:
                    d["shape"] = list(field.shape)
                fields.append(d)
            return fields, numpy_byteorder_to_asdf_byteorder(dtype.byteorder, override=override_byteorder)
    
        if dtype.subdtype is not None:
            return numpy_dtype_to_asdf_datatype(dtype.subdtype[0], override_byteorder=override_byteorder)
    
        if dtype.name in _datatype_names:
            return dtype.name, numpy_byteorder_to_asdf_byteorder(dtype.byteorder, override=override_byteorder)
    
        if dtype.name == "bool":
            return "bool8", numpy_byteorder_to_asdf_byteorder(dtype.byteorder, override=override_byteorder)
    
        if dtype.name.startswith("string") or dtype.name.startswith("bytes"):
            return ["ascii", dtype.itemsize], "big"
    
        if dtype.name.startswith("unicode") or dtype.name.startswith("str"):
            return (
                ["ucs4", int(dtype.itemsize / 4)],
                numpy_byteorder_to_asdf_byteorder(dtype.byteorder, override=override_byteorder),
            )
    
        msg = f"Unknown dtype {dtype}"
>       raise ValueError(msg)
E       ValueError: Unknown dtype object

/opt/anaconda3/envs/romancal/lib/python3.12/site-packages/asdf/tags/core/ndarray.py:148: ValueError
@schlafly
Copy link
Collaborator

Thanks, this is important. We should review what fields currently allow Nones that go into the individual image metadata.

@braingram
Copy link
Collaborator

What do folks think about using a list (or dict) of dicts instead of a table for the individual metadata? Something like:

level_2_metadata = []
for level_2_model in level_2_models:
    # do stuff with model (resample, etc)
    level_2_metadata.append(dict(level_2_model.meta))
level_3_model.meta.level_2_metadata = level_2_metadata

The schemas aren't quite organized in a way that would easily allow reuse of just the meta from level2 but it might be possible to reoganize it to include something like:

level_2_metadata:
  type: array
  items:
    $ref: level_2_metadata

@schlafly
Copy link
Collaborator

The alternative proposal we had once discussed was for this to be a list of the input L2 metadata; i.e., essentially what you have without the dict() wrapping the level_2_model.

The tables are a quality of life improvement over that, but have been more maintenance.

Is your objective here primarily to remove the astropy tables as potentially problematic for archiving, to make the code more robust, or something else?

@braingram
Copy link
Collaborator

Thanks! I was mainly asking since the current metadata is quite nested and rich (with custom objects). Mapping this to a table loses this structure.

For a user I'm not sure which is more convenient/useful. If they're used to level 2 metadata in level 2 files the table structure would be different.

For developers, maintaining 2 structure seems more work than 1 (referenced twice).

If the tables are primarily for users one option might be to make a helper function that takes a list of metadata and produces a table. That would remove the need for a schema to define this table and allow the pipeline to accumulate metadata with little more than an append.

@PaulHuwe
Copy link
Collaborator Author

We decided on a table, because we wanted for an easy structure for users to be able to sort on whichever columns they want and see related data. This was viewed as more important than preserving any nesting.

@PaulHuwe
Copy link
Collaborator Author

Proposal: somewhere around here

subtable_vals.append(
[str(subvalue)]
if isinstance(subvalue, (list, dict, asdf.lazy_nodes.AsdfDictNode, asdf.lazy_nodes.AsdfListNode))
else [subvalue]

replace None values with a valid null type / default value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants