Nonetype Entires Break Individual Image Metadata #406

PaulHuwe · 2024-10-16T21:18:20Z

None values in Metadata can cause validation errors:

def mk_rcs(**kwargs):
  ...
  meta.rcs["bank"] = kwargs.get("bank", None)
  meta.rcs["led"] = kwargs.get("led", None)
  ...

Multiple files then added into a mosaic via:

wfi_mosaic_model.append_individual_image_meta(wfi_imageX.meta)

Then trying to validate:

wfi_mosaic_model.validate()

Yields:

dtype = dtype('O'), include_byteorder = True, override_byteorder = None

    def numpy_dtype_to_asdf_datatype(dtype, include_byteorder=True, override_byteorder=None):
        dtype = np.dtype(dtype)
        if dtype.names is not None:
            fields = []
            for name in dtype.names:
                field = dtype.fields[name][0]
                d = {}
                d["name"] = name
                field_dtype, byteorder = numpy_dtype_to_asdf_datatype(field, override_byteorder=override_byteorder)
                d["datatype"] = field_dtype
                if include_byteorder:
                    d["byteorder"] = byteorder
                if field.shape:
                    d["shape"] = list(field.shape)
                fields.append(d)
            return fields, numpy_byteorder_to_asdf_byteorder(dtype.byteorder, override=override_byteorder)
    
        if dtype.subdtype is not None:
            return numpy_dtype_to_asdf_datatype(dtype.subdtype[0], override_byteorder=override_byteorder)
    
        if dtype.name in _datatype_names:
            return dtype.name, numpy_byteorder_to_asdf_byteorder(dtype.byteorder, override=override_byteorder)
    
        if dtype.name == "bool":
            return "bool8", numpy_byteorder_to_asdf_byteorder(dtype.byteorder, override=override_byteorder)
    
        if dtype.name.startswith("string") or dtype.name.startswith("bytes"):
            return ["ascii", dtype.itemsize], "big"
    
        if dtype.name.startswith("unicode") or dtype.name.startswith("str"):
            return (
                ["ucs4", int(dtype.itemsize / 4)],
                numpy_byteorder_to_asdf_byteorder(dtype.byteorder, override=override_byteorder),
            )
    
        msg = f"Unknown dtype {dtype}"
>       raise ValueError(msg)
E       ValueError: Unknown dtype object

/opt/anaconda3/envs/romancal/lib/python3.12/site-packages/asdf/tags/core/ndarray.py:148: ValueError

The text was updated successfully, but these errors were encountered:

schlafly · 2024-10-17T02:04:17Z

Thanks, this is important. We should review what fields currently allow Nones that go into the individual image metadata.

braingram · 2024-10-17T12:49:43Z

What do folks think about using a list (or dict) of dicts instead of a table for the individual metadata? Something like:

level_2_metadata = []
for level_2_model in level_2_models:
    # do stuff with model (resample, etc)
    level_2_metadata.append(dict(level_2_model.meta))
level_3_model.meta.level_2_metadata = level_2_metadata

The schemas aren't quite organized in a way that would easily allow reuse of just the meta from level2 but it might be possible to reoganize it to include something like:

level_2_metadata:
  type: array
  items:
    $ref: level_2_metadata

schlafly · 2024-10-17T12:59:14Z

The alternative proposal we had once discussed was for this to be a list of the input L2 metadata; i.e., essentially what you have without the dict() wrapping the level_2_model.

The tables are a quality of life improvement over that, but have been more maintenance.

Is your objective here primarily to remove the astropy tables as potentially problematic for archiving, to make the code more robust, or something else?

braingram · 2024-10-17T13:09:51Z

Thanks! I was mainly asking since the current metadata is quite nested and rich (with custom objects). Mapping this to a table loses this structure.

For a user I'm not sure which is more convenient/useful. If they're used to level 2 metadata in level 2 files the table structure would be different.

For developers, maintaining 2 structure seems more work than 1 (referenced twice).

If the tables are primarily for users one option might be to make a helper function that takes a list of metadata and produces a table. That would remove the need for a schema to define this table and allow the pipeline to accumulate metadata with little more than an append.

PaulHuwe · 2024-10-19T03:41:57Z

We decided on a table, because we wanted for an easy structure for users to be able to sort on whichever columns they want and see related data. This was viewed as more important than preserving any nesting.

PaulHuwe · 2024-10-21T15:06:43Z

Proposal: somewhere around here

roman_datamodels/src/roman_datamodels/datamodels/_datamodels.py

Lines 97 to 100 in 053b844

    
           subtable_vals.append( 
        
               [str(subvalue)] 
        
               if isinstance(subvalue, (list, dict, asdf.lazy_nodes.AsdfDictNode, asdf.lazy_nodes.AsdfListNode)) 
        
               else [subvalue]

replace None values with a valid null type / default value.

PaulHuwe assigned WilliamJamieson Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nonetype Entires Break Individual Image Metadata #406

Nonetype Entires Break Individual Image Metadata #406

PaulHuwe commented Oct 16, 2024

schlafly commented Oct 17, 2024

braingram commented Oct 17, 2024

schlafly commented Oct 17, 2024

braingram commented Oct 17, 2024

PaulHuwe commented Oct 19, 2024

PaulHuwe commented Oct 21, 2024

Nonetype Entires Break Individual Image Metadata #406

Nonetype Entires Break Individual Image Metadata #406

Comments

PaulHuwe commented Oct 16, 2024

schlafly commented Oct 17, 2024

braingram commented Oct 17, 2024

schlafly commented Oct 17, 2024

braingram commented Oct 17, 2024

PaulHuwe commented Oct 19, 2024

PaulHuwe commented Oct 21, 2024